Thinking cyber-attacks are distant? The unprecedented global spread of WannaCry should have provided ample proof of just how dangerous and costly it can be. The attack affected organisations in more than 150 countries and some 300,000 Windows PCs, hitting high profile organisations such as the UK's National Health Service, the Ministry of internal Affairs in Russia, FedEx, Nissan and Hitachi.
Such a high profile incident should deliver a wake-up call for organisations and spur them to investigate if their current data protection and disaster recovery (DP/DR) strategy is ready to handle this level of risk. In simple terms, businesses need to ensure that in the event of such an attack, they are well-positioned to be back up and running as quickly as possible, that they minimise any data loss or customer experience disruptions and protect against future infection.
Getting backup right
In the event of an attack, the first port of call in trying to rescue the situation is the company's data backup. Assuming the organisation has the proper protections in place to safeguard the backup from being destroyed by the ransomware, it needs to ensure that the backup strategy is fit for purpose. For example, it may only be in the aftermath of a ransomware attack that a company discovers the last good backup is 24 hours old because it has set a daily recovery point objective (RPO). In addition, retrieving that data may take up quite a lot of time, enough time in fact that the recovery time objective (RTO) for the backup could be as much as two days. Hardly ideal.
So what measures should an organisation take to reduce the RPO and RTO for their backup regimes? One solution would be to look for backup products that provide snapshots for point-in-time recovery, locally and remotely, through native replication. This would significantly reduce RPO down to 15 minutes with asynchronous replication or zero RPO and near zero RTO with synchronous replication. For virtual machines (VMs), it should be possible to rapidly restore the OS to the last usable point-in-time, reducing the downtime caused by affected VMs.
The right recovery
In most LUN-based storage (logical unit number - based), the best an organisation can achieve is to recover all the applications on the same LUN to the same point-in-time. But as ransomware typically comes in waves and spreads throughout systems over time, the rate of infection for different applications on the same LUN can vary widely. For instance, the first attack might only affect 10 percent of VMs on the LUN, an equal number might be hit in a second attack several hours later and the rest might be completely unaffected. Forcing the recovery of the entire LUN means unaffected VMs will be forcibly recovered to the point-in-time necessary for the earliest affected VMs. This results in a completely unbalanced approach where 90 percent of VMs are unnecessarily recovered to the earlier point-in-time for the sake of the 10 percent that were hit in the first attack.
What is required is a solution that provides the granularity of recovery on VMs with different levels of infection by providing the capability to recover on a per-VM basis. This enables organisations to restore the affected VMs to the right point-in-time for them.
Enable time travel
The other issue to be aware of is that most backups limit businesses to restoring to a specific point in time. Essentially, they are given a one-way ticket to the specified backup point and lose the ability to restore to any snapshots that happened after that point. With ransomware, it can be hard for organisations to pinpoint the moment when the attacks started to affect their VMs. They may end up restoring to a point well before they were infected by the ransomware. With more modern storage systems, it is possible to move back and forth between recovery points to gain a more accurate view of when VMs were affected and restore more accurately.
If organisations deploy a modern and effective data backup and recovery strategy that achieves faster RPO and RTO, ensures faster recovery and gets services up and running with VM-level granularity, there is no reason why they should WannaCry after a ransomware attack.
Contributed by Dan Florea, director of product management, Tintri
*Note: The views expressed in this blog are those of the author and do not necessarily reflect the views of SC Media or Haymarket Media.