In our last blog, we discussed some common DRaaS backup and recovery misconceptions. This week we go over what a DR Runbook is, why you need one, and what you need to have in it.

What is a IT Disaster Recovery Runbook?

A DR runbook is a working document, unique to every organization, which outlines the necessary steps to recover from a disaster or service interruption. It includes primary and escalation contacts along with infrastructure and process instructions for team members to follow.

It is an essential component of your company’s business continuity plan.

What is the Difference Between Business Continuity and Disaster Recovery?

While Business Continuity and Disaster Recovery planning are sometimes used synonymously, they are different. Business Continuity (BC) refers to the ability of an entire organization to continue critical functions and processes in the event of a disaster. Disaster Recovery (DR) is a documented and tested process to not only restore systems to production, but also to return the business’ IT functions to their pre-disaster state. As such, DR is an essential component of your company’s business continuity plan.

Receive a Custom IT DR Runbook Template

Learn more about how AllConnected can help you with your DR planning and receive a free customized template for IT Disaster Recovery.

When should I create a DR Runbook?

Well, now.  Before a disaster strikes, every organization should have documented processes to maintain their IT environment.  

The COVID-19 crisis made many businesses painfully aware of their readiness to react to public disasters, and many businesses hit with ransomware have learned the necessity of testing multiple backups and setting processes and timelines for failover and failback. 

At the very least, you should create and/or update your DR Runbook when you: 

Add technology to your network, including cloud-based resources
Add or reassign roles in your IT department
Can’t recall all the steps in what should be a standardized process
Realize that your disaster recovery plan hasn’t been reviewed in months

Our Top 10 Runbook Requirements

At the outset, your DR runbook should provide your organization a clear plan of action for recovering from a disaster.

While every organization is different the DR Runbook should include:

Make sure to include screenshots, diagrams, graphs, and/or tables to support the written documentation.

An Overview to explain your disaster recovery methodology
A list of your organization’s hardware, software and databases along with which team members are responsible for their maintenance
A table of your organization’s contacts with cell phone numbers
A list of every possible alert for a particular service, along with instructions on what to do and who to contact
Instructions on how to deploy the software and build a server from scratch

1. Actionable

While it’s great to have the big picture spelled out with your governing IT methodology and marketing strategy, your DR Runbook should read more like a checklist than a white paper.

At its core, your DR Runbook is a list of tasks directed toward achieving a very specific goal. Each task should be clear, discrete and executable. Tasks that don’t serve that goal should be removed. 

As much as possible, the runbook should be written for an end user instead of an IT specialist. Don’t assume everyone knows the right directories, scripts, or servers where certain functions live. Spell out each task as one bullet point or line.  

That said, sometimes a sentence or two of context next to the task can also be helpful.  

2. Accessible

You need to ensure your DR Runbook is easily accessible. At the same time, it can’t be so secure that an end user can’t find it when it is needed. Prepare both soft and hard copies of your runbook, and give the hard copy a prominent position near the equipment that it serves.

Soft-copies of your DR Runbook should be accessible through a password-protected cloud portal. Make sure to provide links to it in the opening pages of the portal so it is cellphone accessible. 

Review your team authorizations to make sure your team has the right permission to access which runbooks. 

Note: for cyber security reasons, never store network credentials in a runbook. That information should be tied to a contact responsible for the device. 

As much as possible, include searchable metadata to your runbook sections for reference.  For example, each section of your runbook might have: 

 – Purpose or description (for an incident, scheduled maintenance, development) 

 – Creation date and time 

 – Latest update date and time 

 – Author(s) 

 – Major systems referenced 

Another good practice is to identify sections based on the type of alert so that your least-technical team member is able to find the correct tasks without having to review the whole runbook.  

3. Accurate

If your DR Runbook is out-of-date, your team members might experience the following during disaster recovery:

They may distrust and avoid the runbook, and waste time trying their own solutions.
They may restore some services and assume the crisis is over, unaware that the new, mission-critical resources are still down, delaying your organization from functioning normally.
They may execute the wrong tasks, not merely slowing the recovery goal but making the crisis worse.
They may waste time trying to reach the wrong person responsible for a task.

To make sure your DR Runbook is accurate, review it regularly. This sounds obvious but as organizations struggle to achieve a “zero-downtime” environment, they tend to focus all of their available time confirming that updates or added services function correctly, and even backup correctly, without never considering the impact on their DR processes.  

On a consistent rotation, test your DR Runbook with an end user and allow them to give feedback on its accuracy. 

Keep track of when a runbook was last updated, if possible, when it was last run.  

Remember, your DR Runbook should not become the gathering point for a variety of processes.  It should remain focused on its single goal. Don’t be afraid to split runbooks and then insert references if required. 

4. Authoritative

Make sure there is one runbook, and only one, for each process. Instead of adding post-its or written edits on hard copies, update the official DR Runbook and reprint with the correct “Last update.” Discard the older version. 

As needed, reference and link to other runbooks for certain processes. However, if there are multiple runbooks for a given scenario, you’ll want to combine them into one and make sure the other is archived. 

5. Adaptable

All software and team roles change over time, so your runbook must change too. Otherwise they will become neither accurate nor actionable. Ways to encourage adaptability include:

Assign the responsibility for a runbook to the system owner. For example, the database admin is responsible for updating the DR Runbook database section.
Make DR Runbook review part of your system update and testing process.
Celebrate runbook accuracy during a quarterly review!
Automate when and where it makes sense. Creating automated scripts enables new team members to perform important tasks quickly.
Foster a culture of continuous improvement that includes updating runbooks regularly. We can’t recommend allowing just ‘anyone’ in your organization to modify the runbook, but you can encourage anyone submitting improvements to the system owner.

6. Comprehensive

Your runbook should not only include all of your IT resources, but it should also consider a variety of disaster scenarios. Many who were prepared for an internal technical crisis, like a disk failure, were caught flatfooted by the COVID-19 pandemic that forced many to work remotely.

Disasters could be man-made (ransomware or other cybercrime, intentional or unintentional sabotage by employee), as well as natural or environmental (water main break, power outage, earthquake, etc).

7. Compliant and Audit Ready

Your organization may need to be compliant with one or more industry standards, such as NIST, HIPPA, FINRA, PCI, CCPA, CSET/DHS or others. Make sure you refer to the latest documentation releases for specifics.

AllConnected can help guide your organization’s DR strategy toward compliance.  

Several industry standards also have independent auditing requirements. Every organization needs to maintain adequate records, including lists of contacts, hardware and software vendors, dates reflecting upgrades and changing business practices.  

Make sure to store copies of your DR Runbook on and offsite, and are made available to those who require them. 

Auditors will: 

 – examine records, billings, and contracts to verify that your organization is legally compliant.  

 – test your procedures to determine their effectiveness and make sure they meet your company objectives. 

Auditing your Business Continuity and Disaster Recovery plans through a third-party also provides a validation to stakeholders that your documentation is complete and accurate.  

8. Able to Delegate

Just as your DR Runbook should be comprehensive enough to consider multiple disaster scenarios, it should be clear and accessible so that subordinates can execute the plan in the event that your primary resources are not available.

Consider both escalation levels (for more difficult or wide-spread crises, as well as delegation levels for key team members). These levels should be reflected in your department training as well.

9. Validated

Your DR Runbook is of little use if it is not tested. AllConnected’s “validated recovery” service for DR enables your organization to not only confirm the integrity of backups, but run comprehensive tests to ensure that your entire IT environment can failover to a secondary site if your production environment is compromised. 

10. Integrated into Your Corporate Culture

Runbooks should not be a one-time report that grows stale.  It should be an integral part of your business processes.  

Your DR Runbook should be revisited at every major corporate acquisition, at every new product launch, and at every system improvement procedure. 

Keep it simple so that new employees can understand it
Keep it short but with enough detail that you don’t require other outside references
Refer to it frequently, and have team members reference it for training purposes
Make it visual: Include screenshots showing each step in the procedure.
Keep it handy on a searchable internal portal so employees know where to find it.

Final Thoughts

Once your DR Runbook has passed the above requirements, you can have peace of mind knowing that it has been vetted, approved, and delivers on the promise of disaster recovery. 

If you want to learn more, contact us through the form below to receive a free customized template for IT Disaster Recovery, your own DR Runbook.