Peter Groucutt, managing director, Databarracks
Peter Groucutt, managing director, Databarracks

Last month (May 2017) a massive power supply failure in a west London data centre caused a global crash of BA's IT systems. The ensuing chaos disrupted staff and grounded travellers in more than 70 countries, as thousands of flights were cancelled, impacting 75,000 customers. 

Whilst the exact chain of events are still being determined and scrutinised (various reputable experts have questioned the legitimacy of the power surge explanation), an incident of this scale brings two things into stark focus: (1) there are no absolute certainties in IT resilience, even for the world's largest organisations, and (2), the subsequent cost of IT downtime can be both hard to quantify and incredibly far reaching. 

BA's outage is bewildering because of its apparent simplicity. Recent reports claim that the incident was the result of human error, with an on-site contractor performing an unauthorised disconnection and reconnection of a critical power supply. To make an old but salient point about resilience: organisations must align the degree to which they protect systems and assets with their relative criticality to the business. 

Uptime is utterly essential in the airline industry, where thousands of time-critical processes leave very little margin for error. Any system failure could cause massive ripple effects across BA's entire fleet, immediately incurring rapidly escalating costs as the outage persists. So how might we begin to quantify those costs? 

There are two figures to consider when quantifying the cost of IT downtime – known, tangible costs (such as loss of income, regulatory fines and compensation), and hidden, intangible costs (such as reputational damage and customer defection). 

For British Airways, industry experts have suggested that the airline will be forced to pay as much as £150 million in compensation. This figure represents the known costs for pre-set pay-outs for missed flights and associated out-of-pocket costs for affected customers, such as hotels and food. BA will also need to include an internal review of IT systems, and productivity losses throughout the incident. 

For other industries where customer compensation figures are not agreed, these costs are more difficult to calculate. 

As might be expected, the “hidden” costs of an outage are harder to calculate. As a publicly listed company BA has been able to quantify the cost in reputational damage after we saw €400 million wiped off the share value of its parent company, International Airlines Group (IAG). Reputational damage means more than share value however. Consumer airlines operate in a fiercely competitive market, and whilst experts have said that the strength of BA's brand will carry it through the subsequent fallout, there is likely to be an initial dip in sales. 

For private companies without the immediate feedback of share price to calculate reputation damage, it's crucial to attempt to attempt to put real figures to these unknown costs. An incident might result in X% reduction in sales for X months, which will equal a loss of £X, for example.  

Providing this kind of reasoned estimate can be hard for private companies without the public reputation or direct correlation between customer experience and brand loyalty enjoyed by BA.  

Nevertheless, it's essential so as to get closer to an accurate understanding of the true cost of IT downtime. Additionally, it can also be used as an exercise to leverage continued board-level investment in ongoing IT resilience conversations. This is particularly applicable to BA, as recent findings from a major shareholder advisory group, International Airlines Group, warned BA's parent company (IAG) of the need to appoint persons with IT knowledge to its board. Organisations with limited knowledge of IT at senior levels should not be dissuaded from discussing resiliency and the cost of downtime, but rather attempt to bridge knowledge gaps with approachable language and compelling fiscal reasoning. 

While this failure may have had nothing to do with cyber-attacks, the consequences are almost identical to a shutdown caused by a 'successful' cyber-attack, thus the lessons learned apply in both cases.

Contributed by Peter Groucutt, managing director, Databarracks

*Note: The views expressed in this blog are those of the author and do not necessarily reflect the views of SC Media or Haymarket Media.