John Walker
John Walker

In the last year we have seen loss of service availability at some high profile sites claiming the outage was due to a problematic system upgrade impacting operations.

In October 2014 the Bank of England was forced to temporarily suspend its real-time Clearing House Automated Payments System (CHAPS) for 10 hours after a technical glitch attributed to routine maintenance. Unlike the January Facebook event, which disrupted users' social media activities for an hour, CHAPS directly impacted a critical payment system which services the real-time financial requirements of millions of end users and business to the tune of £277 billion per day. It delayed transaction on 2,450 homes, and stalled business transactions of both SME, and corporate organisations.

In Southern California back in 2011 a maintenance worker caused the loss of a line operated by Arizona Public Service, resulting in a massive power outage impacting critical services, including traffic lights, causing 3.5 million gallons of sewage to be spilled into the ocean, having an indirect implication on SCADA environments, putting two nuclear reactors off line thus a loss of electricity.

The lesser evil?
If we accept that critical systems are suffering such devastating outrages, does this imply that the prerequisites as stipulated by good security practices and standards have not been followed? And that the expectations of testing such changes have not been provisioned to the level commensurate with such critical systems though a lifecycle of Test, User Acceptance, and the eventual promotion to Production of a stabilised change?

Looking at the Facebook outage where hacking group Lizard Squad claimed responsibility, could it be possible that in some circumstances the excuse of a maintenance failure is the lesser evil compared to an admission of suffering a system compromise?

"Could maintenance failure be
an excuse?"

Holly Smith a solicitor with Buckles LLP commented on CHAPs: “The system downtime of CHAPs resulted in failures to meet contractual deadlines. Most contracts contain interest payments and penalties for late payments and parties had to rely on goodwill of their counterparties not to enforce such penalties. A survey conducted by the Law Society indicated that 30 percent of residential transactions were unable to complete until the next day or later – this system outage resulted in a very real impact on both businesses and individuals alike.”

Peter Wood, CEO of First Base Technologies LLP told SC: “I suspect that many of these organisations have a problem with ‘agile development' – the need to push changes out as quickly as possible – which frequently results in inadequate testing and QA. Traditional change control processes are sometimes seen as too cumbersome and slow, resulting in code being released that may not have been adequately tested. It's equally possible that some organisations believe that the negative publicity associated with ‘system update problems' is far less damaging than being a victim of a hacker group.”

No matter the application of a flawed system upgrade, or an event driven by a hack, both these circumstances of insecurity actually infer that there are security issues in play, be they failure to follow a robust process of change lifecycle controls, or to provision adequate system defence – both failures are equally unacceptable.

Conversely, claims made by hackers may be seeking adverse credit for what is actually a system failure. But in perceptions of public acceptability, would admission to a faulty change represent a lesser evil than the real circumstances of a loss of service? Use of scapegoats or not, when it comes to who pays the price for any lacklustre security, at the end of the day it is always the end users.

John Walker, visiting professor at Nottingham Trent University