Oscar Arean, technical operations manager, Databarracks
Oscar Arean, technical operations manager, Databarracks

Recently, the results of our 2015 Data Health Check (a survey of over 400 UK IT decision makers, full report to be released in August) revealed that an extremely high proportion of respondents had IT-specific DR plans in place alongside their existing business continuity (BC) plan.

While these findings are encouraging, what often isn't discussed is the difficulties organisations face when sticking to these plans during a real disaster. 

Recently, the New York Stock Exchange (NYSE) sparked controversy after it was forced to suspend trading for three hours following a major technical glitch, rather than failing over to its Chicago-based recovery site. When pressed on its decision, the NYSE president Tom Farley said it was "never really an option" to implement its disaster recovery plan and that it would only ever failover to the Chicago site in the event of a "catastrophe".

Truth be told, the decision to either failover to a secondary site or try and fix the problem in house is a decision many organisations struggle with. Most organisations will have a DR and BC plan in place, which will specify the exact length of an outage that is acceptable before DR needs to be invoked, but many can struggle to apply this during a real-life incident. In a bid to tackle a disaster it's very easy to let timings slip.

For those looking for guidance, it's important to understand that failover points are individual to each organisation, differing from disaster to disaster. A business might have a set response for a storage related issue, and this could differ entirely to its response to a network issue, for example. It's up to your Crisis Management Team (CMT) to identify the most likely disaster scenarios across your business and to plan for each of these, including identifying the point at which you should move to your recovery site.  

In the heat of the moment, it might be tempting to ignore plans, especially if you believe your team is close to finding a solution. This is, however, often to your detriment and should be avoided.

Just as your Incident Response Plan provides instructions for responding to a number of potential security scenarios, your DR plan does the same for IT outages. During a security breach you will have a different response for dealing with ransomware in your environment than you would for a DDOS attack, for example. Your DR plan should take the same approach. By working with your CMT you'll be able to run DR tests for every likely scenario, which will allow you to familiarise yourselves with the plan, and more importantly allow you to become comfortable with invoking it during an incident.

Organisations should also consider scenarios when it is not necessary to failover. For larger organisations, going offline while addressing and containing an incident is considered the lesser of two evils when compared to the time consuming and expensive nature of invoking a DR plan and moving to a recovery site. As a result, many of these organisations are exploring more flexible alternatives.

Through cloud computing, Disaster Recovery-as-a-Service (DRaaS) helps eliminate the cost of dedicated data centre space and recovery facilities. DRaaS solutions also mean organisations no longer have the burden of hardware and software upkeep. As a result, organisations adopting DRaaS are likely to find they now have the time to ensure effective procedures are in place to handle more complex incidents – in doing so the decision to invoke a DR plan becomes less of a strain.

Contributed by Oscar Arean, technical operations manager, Databarracks.