Industrial control systems can be so fragile that the mere act of investigating a malware infection can be enough to crash the system and bring the supported system to a halt.
That was the message from Mark Fabro, a forensic investigation expert and president of Lofty Perch, speaking at the 4SICS industrial control system conference in Stockholm last week.
Fabro's company has specialised in investigations of ICS systems since its formation in 2005. He co-authored the US Department of Homeland Security Industrial Control System Computer and Emergency Response Team's recommended practice document along with another ICS cyber-security expert, Eric Cornelius, director of critical infrastructure and industrial control systems (ICS) at Cylance Inc.
Many industrial control systems were built at a time when there was no expectation that they would be connected to the internet, and before a time when malware was as active as it is now. Viruses that would otherwise be benign in the IT world can have extreme, catastrophic consequences in the control system domain, Fabro told SCMagazineUK.com.
“Simple denial of service, zombying or botting – viruses that we can get in IT that are not much more than a nuisance, when you see the tactical implementation of those viruses and the kinetic impact they can have on these older systems, it can be very bad,” he said.
Fabro has extensive experience investigating malware on active production systems, and the biggest challenge he has found is the need to work on systems that cannot be taken out of production. As a result, many traditional IT forensic techniques can't be used because they require you to work on systems that can be isolated, imaged, bagged and tagged.
“We have to have a live analysis of the system at that time and do a comparative analysis against what we expect the norms to be in the systems,” he said.
ICS also have protective measures built into them to help prevent piracy. When these systems detect suspicious activity which could be an attempt to copy the software, they shut down.
They often have to work within times and dates set by the system owner, so it's not unusual to be working from 2am to 5am to avoid affecting vital systems, under the gaze of a system engineer to ensure it continues to function.
For instance, in a public transport environment – a typical 24/7 operation – it was impossible to take the system offline altogether but working in the wee hours of the morning meant that if a system were to shut down as a result of the investigation, it would impact a fewer travellers.
Every control system has its own characteristics and foibles, he said. ICS which run on newer operating systems tend to be more stable, but he said some of the systems he deals with are 25 to 30 years old.
“When you get into older systems on older platforms, there's an enormous amount of unpredictability on how the investigation process is going to be received by the system itself,” Fabro said. “Traditional investigation methods such as inserting USBs or trying to do memory captures – these are elements of investigation that these control systems are historically not used to. And it becomes very hard to predict what the very action of going to observe the state of the production system is going to have.
“The production system could hang, it could start to trip up, it could start to fail.”
A forensic investigation involves two steps, he said: firstly, getting to the assets themselves and identifying the indicators of compromise and, secondly, gaining access to the communications infrastructure.
“Getting into the communications infrastructure can be tricky because you are introducing yourself into a communications system that has historically only seen control systems traffic. Investigation techniques, trying to probe and do analysis for deviations of that normal operational envelope, introduce new things to the network that the older systems may not know how to deal with,” he said.
Enumeration packets looking for services, trying to gain access to the human-machine interface or going off a DOS prompt to get you state information or a process table – any of these could confuse the system.
“An instance of that would be that that system which has never actually had to entertain those type of queries would be forced to try and move memory around and the memory that it now wants to allocate to serve up the investigatory information like the state table may take it away from managing the process which may cause certain processes to fail,” he said.
In transportation infrastructure, this could manifest itself in failure of field equipment such as signals or the introduction of non-control system traffic in the network that causes packet delays.
He says he can't overestimate the importance of understanding the infrastructure before commencing an investigation. In one case, a client had to call him in after their own IT team had crashed the system.
“A simple example is a manufacturing facility that involves chemicals and polymers,” he said. “There was a malware that was actually in the production facility that was residing on a master control system server. The only reason that the virus was able to live there was that the virus required that the administrator shares were on.”
The IT department was sent to investigate whether the virus had infected any other systems.
“Without doing the necessary investigations on the attributes of the virus itself, the team went and – according to their own internal protocol – went to every device in the control system environment, logged in as the administrator and turned on the shares to see whether the virus was there.
“Well, the virus wasn't there but as soon as the administrator shares were on, the virus from the host machine saw all the shares open up and then went to all the other servers – which did bring the production facility to a complete stop.”
Because the plant made polymers, the stoppage caused the product to harden in the valves – highlighting the need for appropriate subject matter expertise when dealing with these complex and non-standard systems.
He recommends that companies which are setting up incident response teams ensure that they include both IT and control system engineers.
“We know for a fact that some traditional methods for IT incident response and forensics will work in control system environments. It is mandatory that there is an inclusion and input of subject matter expertise from a control systems engineering perspective,” he said.
Why aren't these systems being upgraded? “There are two reasons for that: first the system is functioning and nothing has happened. And the other thing is... there is a window required where the system (needs to be) offline to actually upgrade them,” he said. “Those windows for forklift reinstalls in the factory are few and far between.”
With systems running on Windows XP and NT, the lack of ongoing support increases the opportunity for zero-day exploits for which there are no malware signatures. “Fortunately the community of interest has come up with things like application white listing and kernel locking which do things like prevent unauthorised applications or code to write to memory,” Fabro told SC.