Recovering from a cyber attack is nothing that can be left to improvisation. It must be carefully prepared in advance for the day when the worst will happen. As Benjamin Franklin once said, “If you don’t plan, you plan to fail.”
Back in the cyber security perspective, recovery activities are well highlighted in the NIST’s cyber security framework.
While 4 sections relate to monitoring and protection, the 5th element, although often overlooked by security specialists, is truly about enabling the resumption of business operations through planned recovery (through restoration or reconstruction) computing capabilities. And that is precisely what this article is about!
Disaster Recovery Planning
The disciplines associated with the recovery of IT services after (cyber-)disasters are Disaster Recovery Planning (DRP), or more broadly IT Service Continuity Management (IT-SCM), as also described in ITIL. Developing effective continuity strategies and DRPs also meets several requirements of ISO-27001 (information security management).
A “successful” cyber attack targeting your IT systems is, in most cases, a real “logical” disaster (to be compared with e.g. a flood, a fire or a plane crashing on your data centre) and can be so disruptive that “standard” incident processes in place can’t handle it properly. Additionally, typical high-availability solutions (even when spanning across buildings, campuses, or regions) cannot, in most cases, help recover from logical data corruption (resulting cyber attacks).
Therefore, special arrangements are needed to support organisations facing such disasters. The performance and effectiveness of disaster recovery solutions are evaluated and measured using 2 main KPI’s. These KPIs are also used when defining the strategies towards effective disaster recovery arrangements (and later on, to confirm their effectiveness as measured during simulation exercises).
Recovery Time Objective
The first KPI is the Recovery Time Objective (RTO). This is the time needed to resume a normal business activity (point 5 on below schema) after a major disruption (point 0).
This clearly highlights that the time allowed to recover IT (i.e. IT-DRP execution time) is only a fraction of the total period of disruption (as perceived by the end-users).
Working to optimize each step will help to 1) minimize overall disruption or, eventually, 2) give IT teams more time to perform recovery procedures.
Such optimization can be achieved by implementing proper tools, processes, and documentation, but it is equally important to train and familiarize the parties involved with the entire recovery scenario. This is also why regular exercise is so important for successful recovery in case of a real break-in.
Recovery Point Objective
The second KPI is the recovery point objective (RPO). This is the amount of data lost from the time of the disaster to the last valid copy of the data available.
Obviously, not losing data would be appreciated, and this is today mainly possible with synchronous data replication techniques. Unfortunately, viruses, ransomware, … are instantly replicated with your precious data, rendering these copies useless against cyber threats of corruption.
Solutions and caveats
The popular answer that is spreading is immutable storage and backup solutions that are supposed to allow the restoration of (system and) data to a predetermined consistent time.
Although theoretically correct, one can still question the ability to recover when a large part of the computing landscape is affected by a virus, ransomware or the like. In fact, there are multiple prerequisites to be functional before you can start effective data restoration. Some examples (not exhaustive perhaps):
- Building / rooms access system (or remote access system including e.g. internet, VPN,…),
- A neat and reliable physical infrastructure that can be used to house the restored data,
- User-friendliness and access to immutable server and backup tool (hoping one hasn’t been affected by the threat, or?…), unless directly/physically connect to the backup server with a local account
- Access of the backup solution to the servers to be restored (requiring type of Active Directory, Domain Name Servers and/or equivalent, …).
Conclusion
Your mission, should you choose to accept it, would be to ensure that everything is executed within the set timeframes (not to mention allowed data loss). Good luck with that!
Unless there is already a robust DRP in place in your organization that is regularly and thoroughly tested with a proven track record of recovery timeline and results aligned with expectations, your first action after reading this would be to raise a major risk to your CISO/security director and enterprise risk manager.
Christian De Boeck is a professional who has been offering a holistic approach for more than 20 years to improve the continuity and resilience of his clients. Christian is a member of the Business Continuity Institute and holds a doctorate in science from the University of Brussels (ULB); He is also a certified lead auditor for ISO-22301 (business continuity). Finally, Christian’s company, Synergit, is the result of the combination of his professional career as a researcher, consultant and coach.