Are you following Best Practice in your Disaster Recovery and Business Continuity preparation?

Posted: November 9, 2023

Best practice Disaster Recovery (DR) and Business Continuity (BC) Planning. A man with Asian features prepares for a disaster. He has a bandanna and a military backpack. He is holding a keyboard in one hand and wearing a business suit and tie as well.

Every IT or senior executive agrees. An effective Disaster Recovery and Business Continuity plan is an absolute must-have. In practical terms, however, most organisations’ DR and BC plans are not regularly reviewed for their currency or stress-tested for their ease of implementation in an emergency.

Here’s our guide to best practice in setting, reviewing and regularly testing your DR and BC plan.

Risk Assessment and Business Impact Analysis:

Every Disaster Recovery and Business Continuity Plan must begin as a risk assessment. If you don’t know the true scale of the risks you face, you will inevitably over- or underestimate the scale of your planning and response when a DR or BC event occurs.

Form a cross-disciplinary team: Assemble a team consisting of IT professionals and broader business stakeholders, including key senior decision-makers, to conduct the risk assessment and business impact analysis.
Identify threats and vulnerabilities: Brainstorm and document all potential threats and vulnerabilities that could affect the IT systems. These may include natural disasters (e.g., fire, flood), human errors, DDOS attacks, malware, malicious insiders, power outage, bomb threats, network connectivity, hardware failure, licensing failure, data corruption, cloud provider failures, risks from trusted parties and more.
Assess business impact: Evaluate the potential impact of each identified threat on your IT systems and business operations. Consider factors such as data loss, downtime, financial losses, reputational damage, and legal implications.
Prioritise the risks: Rank the identified risks by scoring them from 1-5 based on their i) severity and ii) likelihood of occurrence. Each risk should receive two scores from 1-5 which are then combined to form an overall score. Focus on the risks with the highest combined score first. If you have more than, say, three risks with the same score on your priority list, you may need to reevaluate them to determine the final order of priority.

Develop a Disaster Recovery and Business Continuity Plan (DR/BC Plan):

Your DR/BC Plan is designed to restore critical IT infrastructure and business functions as quickly as is tolerable following a disaster event. It’s the macro-level plan for getting back to business-as-usual and is distinct from a backup plan.

Establish objectives: Define the objectives of the DR/BC Plan, such as recovery time objectives (RTOs) and recovery point objectives (RPOs). RTO refers to the maximum acceptable downtime, while RPO indicates the maximum data loss tolerable after recovery.
Create an inventory of critical assets: Inventory all critical IT assets, including servers, databases, applications, and network devices. Using a tool such as RapidFire Tools’ Network Detective can ensure you develop a comprehensive list quickly. It will remove unconscious bias in the creation of the list too, just in case you might make assumptions about what is and isn’t on your network.
Plan for events that don’t just involve data. A truly robust plan involves preparation for loss of access to your premises, such as would be experienced in a fire, flood or contamination scenario. Can your teams work from home just as easily as the office? If not, you will need to build in a plan for an alternate premises, plus all the hardware that will be needed to reestablish operations.
Develop contingency plans: Create contingency plans for each critical business function, outlining alternative approaches to sustain operations during a disaster. E.g. Can you roll back to more manual methods while system recovery is undertaken?
Define recovery procedures: Develop detailed step-by-step procedures for recovering each critical IT asset. Remember, the person or team that creates the procedure may not be the same as the one who has to execute it in an emergency, so include clear instructions for data restoration, system reconfiguration, and testing.
Assign responsibilities: Assign clear roles and responsibilities to team members involved in the disaster recovery process. Designate a DR coordinator role to oversee the entire operation. Make sure the person or people in this role is/are available at all times, even over holiday periods.
Assess security measures to protect confidentiality, integrity and accessibility: Review the existing security measures around the handling of information and who has access to it, to ensure that they are adequate to protect data and systems during disaster recovery operations.
Implement encryption and access controls during recovery: Enforce data encryption and access controls to prevent unauthorised access to sensitive information during recovery.
Comply with regulations: Ensure that the disaster recovery process you have designed complies with relevant industry regulations and data protection laws. Eg. If you are going to run temporarily in the cloud, ensure you are not breaching data sovereignty requirements of your industry or key clients.
Document contact information: Delegate the maintenance of up-to-date contact information for all team members, key vendors, and external partners to a particular role on your team, to ensure effective communication is possible during a disaster.
Explain employee roles: Clearly communicate each employee’s role and responsibilities during a disaster, emphasising their contribution to the recovery efforts.

Backup and Recovery Strategies:

A business-interrupting event might not rise to the level of a disaster – that’s where your backup and recovery plan is important.

Identify critical data: Work with your cross-disciplinary team to identify the most critical data and systems that need regular backups. This should be a natural offshoot from the risk identification and business impact analysis you conducted in the first step.
Select backup solutions: Choose appropriate backup solutions based on the RTOs and RPOs established in the DR/BC Plan. Consider options like incremental backups, differential backups, or full backups. Remember the 3-2-1 rule:

3 – Keep at least three copies of your data.
2 – Store the copies on two different types of storage media (e.g., hard drives, tapes, cloud storage).
1 – Have one copy of the data in an offsite location, away from the primary data centre or physical location of your on-premises data.

Although we are vendor-agnostic and can work with any product, our preferred solution is Veeam’s backup and availability suite of products.

Set backup frequency: Define the frequency of backups based on the rate of data change, location of backup, RTO and RPO requirements.
Encrypt backups: Ensure that all backups are securely encrypted to protect sensitive information from unauthorised access.
Test the restoration process: Regularly perform test restorations of significant blocks of data to verify the integrity and accessibility of the backed-up data. Ensure that the backup restoration process works smoothly and efficiently and your in-house or external team is skilled at the procedure.

Redundancy and High Availability:

DR/BC Planning is not just about how you will respond to a business interruption – it should allow you to prepare a robust defence against a business-interrupting event in the first place.

Identify single points of failure: Conduct a thorough assessment of the IT infrastructure to identify physical or virtual hardware components that represent single points of potential failure. Don’t forget about the data running in-between. Cabling can be a critical weakness too. Mapping your network is not only a good way to identify single points of failure, it is really is a critical component of a well-run IT system and should regularly be kept up-to-date. There are many different network-mapping tools available.
Implement redundancy measures: Introduce redundancy for critical components such as storage, deploying multiple power sources, and employing network load balancers.
Set up high-availability configurations: Implement high-availability configurations for servers and network devices, ensuring that they can automatically switch to backup systems if the primary ones fail.

Develop your Communication Plan:

When the users of your systems experience an interruption, they will begin a search for information. In the absence of clear, proactive communication, they will clog up your support mechanisms and their frustration may be misdirected towards the IT function.

Define communication channels: Establish primary and backup communication channels for both internal and external communication during a disaster. Remember, users aren’t going to receive emails when their network connection goes down – you will need an alternate method like Teams messages sent to mobile devices, SMS or WhatsApp notifications.
Develop contact lists: Create contact lists for all stakeholders, including employees, customers, vendors, and business partners.
Choose notification tools. How will you quickly deliver bulk notifications to your user body and other interested parties outside email? SMS, automated phone calls, a recorded message and other options require pairing with a bulk communication tool. Email itself requires delivery groups to be created and user roles wrapped up in them.
Outline notification procedures: Document step-by-step procedures for notifying stakeholders during a disaster, including who will initiate the notifications and how they will be disseminated.

Disaster Recovery Testing:

A DR/BC plan is only as good as its last successful execution. Processes become out of date, experienced team members move on, and hardware and software can become unusable – these are just some of the issues you may encounter when you need your DR/BC Plan to work flawlessly. Half-yearly testing is how you ensure the inevitable obstacles to plan execution are discovered and removed prior to if and when you really need it to work.

Plan testing scenarios: Develop a testing plan that includes various disaster scenarios, such as hardware failures, data corruption, cyber-attacks, and natural disasters. Consider the mode of testing each time. Are you just going to review the plan in writing, and check the network diagram is still up-to-date, or will you engage in a mock or even a full simulation involving switching your primary services off and causing a failover to redundant systems? Although the last suggestion is the best way to be sure your plan will work in a real emergency, it comes with its own risks – you may find out you cannot, in fact, recover as easily or quickly as forecast in your plan and you may cause financial and reputational damage unnecessarily. Spend a lot of time in considering the pros and cons of your testing regime.
Schedule tests: Schedule regular DR tests, considering the complexity and impact of each scenario. Ensure that testing does not disrupt ongoing business operations.
Involve all stakeholders: Engage all relevant stakeholders, including IT staff, business units, and external vendors, in the testing process to simulate real-world conditions.
Document test results: Document the test results, including any issues or lessons learned. Use this information to refine the DR/BC Plan and improve the overall disaster recovery process.

Regular Review and Updates:

Step-by-step instructions:

Schedule regular reviews: Establish a schedule for periodic reviews of the DR/BC Plan, aiming to conduct them at least annually or whenever significant changes occur within the organisation.
Assess plan effectiveness: Evaluate the effectiveness of the existing plans based on the results of previous tests and real-world incidents.
Make necessary updates: Revise the DR and BC plans to address any identified weaknesses or changes in the IT infrastructure or business requirements.
Encourage awareness: Foster a culture of awareness and preparedness among employees, encouraging them to report any potential risks or vulnerabilities they observe.

You must prioritise disaster recovery and business continuity planning to effectively respond to unexpected disruptions and disasters and you must know it will work. Start by conducting a thorough risk assessment and business impact analysis to understand the potential threats and vulnerabilities your organisation faces. Use our guide to develop a comprehensive DR and BC plan with clear objectives, including inventories of critical assets and detailed recovery procedures.

Focus on building redundancy and high availability to prevent business interruptions by identifying and addressing single points of failure. Implement communication plans to proactively inform stakeholders during a disaster, and ensure you regularly test and review your plans to keep them up-to-date and effective.

By following these best practices, you can enhance your organisation’s resilience and minimise the impact of disruptions, ensuring smoother business continuity and higher user satisfaction ratings of the IT department.