How to create an effective Business Continuity Plan

How to create an effective Business Continuity Plan
Do you know how to create an effective BCP? People underestimate the challenges of disaster recovery in the real world today. Having a continuity plan is not just an IT concern. Because it’s about the survival of the entire business, and in doing so, one must consider all angles of calamity. We barely get advance notice of whether a crisis may strike. Even with a short advance warning time, countless things can go wrong. Moreover, every calamity is unique.
Therefore, it is as if we have to street fight. We fight with all the tricks we know against the crisis, which itself often comes out of the corner unexpectedly. It is exactly why we need a BCP. To give you as the CMT of your organization the best weapons for successfully dealing with the disaster. For that, you need plans, and you need to test them. Everyone who has a role in that plan needs to know their place. Even those who should do nothing need to know not to get in the way.
Without a plan, you’ve probably never thought about it, and without formed thoughts, chances are you’ll take longer to get to a good result. If then the recovery time is longer than the MTPD then there is a very real chance that severe permanent damage has been done, which the organization may not be able to overcome. It may even cease to exist. So, you need to know how to create an effective BCP!
In this contribution I give my own opinion, not that of any organization
Author: Manu Steens

How BCPs and DRPs differ.

Business continuity (BC) refers to maintaining necessary business functions and production or recovering them quickly enough in the event of a major disaster. This can range from a flood, bomb threat, fire, terrorist attack or pandemic or cyber-attack or war. A BCP outlines the guidelines an organization should follow when faced with such issues. The BCP does a story on people, resources, processes, buildings, partners and ICT et al.

Currently, the professional literature is aware that a disaster recovery plan (DRP) is NOT the same as a BCP. According to many, a DRP focuses on restoring ICT functions and has everything to do with what one calls “ICT Continuity services” in ITIL. It focuses on ICT and often does not get underway until the most serious blows and injuries (the immediate impact) from the crisis are over. That is why it is an integral part of a BCP, because a BCP focuses on the survival of the entire organization.

An example

For example, if a train disaster renders a building unusable for an extended period of time, do you know how counter services to customers will continue? Will they temporarily work from home, or telecommute in the broad sense? Or will you provide alternative workstations? Or will you hire desk containers?

A BIA

Note that a business impact analysis (BIA) is also part of the process of arriving at a BCP. A BIA identifies the impact of a sudden loss of business processes or products (see BIA, how to do it). That analysis answers the question of which processes are important in your organization. The result may be a division of the processes into time-critical, essential and necessary processes.  Such an analysis also gives a first idea of what you can do as a strategic approach to the problems found. (See strategic choices).

Why a BCP matters.

Whether you have a small organization or a large one, what matters is (serving) the customer. So reputation matters, so does survival as an organization. And there is no better test than “the real thing.”

As recovery of IT functionality has penetrated our society tremendously, there are a huge number of DR solutions available. But what about the rest of the business processes? The organization’s reputation depends on it. Handling a crisis situation positively usually has a good impact on reputation. It can improve customer confidence. That’s why a BCP really does matter !

First step: create a BCP: an outline of steps.

  • If you don’t already have a BCP, start at the beginning: get to know the organization and its processes. It is best to make a complete list of the processes, and map them out. A good technique for this can be a process flow, or “swimlanes” in BPMN. An important advantage of this is that you can reuse it for organizational management. Or vice versa: leverage the efforts of organizational management !
  • Determine in an operational risk analysis which processes are vulnerable and for what. Then prepare an impact analysis per function, list all dependencies and include the risk analysis that you may or may not have made specifically for this BCM exercise. This leads us to the BIA.
  • Then you determine the fundamental strategies you want to use, based on the BIA, for the things management considers important.
  • Make a BC Plan. (See below).
  • Then you test it. See also next paragraph.
  • Evaluate it.

The above is again a checklist, or a plan of action if you will.

Parties involved

Verify that all parties involved have their say and access to the plan. Everyone who has a role to play in it should know it, know that role and have practiced it.

Remember that the DRP is part of the BCP, so it’s best to check with the IT department to see if you can come to an agreement, and if the IT Security included the applications of the time-critical processes in their DRP. Or to make arrangements to include it.

As the plan progresses, it is interesting to bring the necessary stakeholders to the table (not necessarily all together) for a conversation/interview. Also talk about the experiences of people who have already successfully (or not) survived a disaster. It is a good source of fresh ideas, as people love to share their “war stories” and the clever moves from the field that saved the day.

A BCP, what does it say?

Note in advance: make it a BCP, not a manual of making a BCP….

Part 1 – the major outlines of the plano Plan objective, as imposed by management
o Plan scope, as imposed by management
o List of critical processes, with RTO and RPO
o Level of recovery for each process
o Priority of recovery
o The single points of failure that can affect the processes
o Determination of the types of situations for the plan o Reference to other documents
Part 2 – Roles and responsibilitieso What teams are needed and what tasks do they have? What roles do they have?
o Who is in charge of what? What authority to do what?
o What should each role do? Provide instructions
o Provide a simple template for logging
Part 3 – What happens when one activates the plano How do you activate the plan? How do you activate the relevant teams?
o Can the plan be partially activated? When? How?
o Who can initiate the plan?
o Who needs to be informed?
o Who should do what if they think the plan should become active?
o What are the emergency procedures?
o What if the workplace is unavailable?
o How will critical processes be restored?
Part 4 – What are the communication processes?o Who reports to whom?
o Call tree; WhatsApp group,…
o Escalation procedure
o Internal and external communication plan and communication platforms
Part 5 – Listings of supplies and contacts.o Key people and their contact information
o Crisis meeting rooms and recovery sites.
o Contact lists of stakeholders, contractors, maintenance services, firms with SLAs, …
o Facilities and supplies
o Technology and communications
o Information, data and applications
o Legal documents and regulations
o Transportation and logistics
o Petty cash and emergency budgets
Part 6 – How to end the plano Process for ending the crisis and ceasing operation of the plan
Part 7 – Document management informationo Who owns the plan?
o Who authorizes the plan?
o Triggers for revising the plan
o To whom does one distribute the plan for review?
o Change management
o Version control
Part 8 – Referenceso References to laws and regulations
o References to legal documents and agreements
o Statement of confidentiality
o …
Part 9 – Navigationo Content page No unnecessary explanations that don’t matter
o Put checklists and operational information up front.
o Put ancillary information in appendices.

Testing, what could that be?

Construct the plan rigorously. Therefore, also test it rigorously. But it’s best to do that gradually building up. The testing calendar depends on your organization. Or on the key personnel. Or on business processes, or ICT, or change management, or….

Some commonly used tests are below.

Type of testPurpose of the testWhat and how
Desk testingTest and quality assessment of the content of the BCP.Step 1: Desk check: The contents of the documents are gone through by the author and the ICT manager of Operations ICT. In this process, the contents of the documents are reviewed and discussed between the two parties.
Test of the contents of the BCPStep 2: Desktop walkthrough: The author meets with each of the crisis management members and sits down with each of them to walk through the same documents as in the “Desk check” step The author explains in more detail the operation and purpose of the documents.
Use a BCP Scenario and recovery schedule to walk through the continuity plan to validate that the BCP contains both the necessary and sufficient information to enable a successful recovery.Step 3: Desktop scenario: Participants come together and take a concrete case, discussing the appropriate scenarios and diagrams as a test of whether they understood the first two steps.
Communication testTest out the contact numbers (phone numbers) for the people in the crisis phone book and the cascading call BCM schedule. (Are they up to date?)The first year, a communication test is agreed upon in advance. The later years this test is set up unannounced. The goal is for participants to call the right people (each other) within a predetermined time frame.
Pillar BCP Scenario and Recovery approach schedule exercise(small sandbox)Use a BCP incident scenario to role-play for management to test that for one pillar’s continuity plan, the recovery schedules are functionally sound.The organizational control working group creates a role-play based on a scenario. The idea here is to train the participants within a single pillar and test the scenario steps of that pillar. For the pillar being trained, the BCP administrator and BCP teams participate. The other roles in the crisis team are played by the respective members of the organizational control working group. This testing is done for each pillar.
Full BCP Scenario and Recovery approach schedule exercise(large sandbox)Use a BCP incident scenario to role-play for management to test that the continuity plan and recovery schedules are functionally sound. This test can be used to test the cooperation between working groups after each has already done a test separately.The organizational control working group creates a role-play based on a scenario. The purpose here is to train the participants of all pillars together and test the scenario steps. Participants are the members of the crisis team and the BCP teams.
Disaster Recovery testD/R test: test that the ICT systems can be restored in the D/R site.ICT Operations produces an annual scenario regarding the efficacy and workability of the D/R site.
Test crisis meeting roomsTest the functioning of crisis meeting rooms.Team Security holds a meeting in the crisis meeting room at the main location and at the alternate location. Telephone and Internet connections are tested by ICT Operations. The contents of the closets are checked against the inventory.
Activity TestMove business activities to alternate location or tele/home work for a predetermined time to test that participants can use their systems, applications and information, and continue to perform their critical processes.The entity’s teleworkers can all be asked to work from home together for a period of time (to be specified, e.g. one day). Exceptions are granted only to ICT operations people and for meetings. Work is done on the servers in Brussels.

At each stage of testing the BCP, it is best to involve (new) observers. They see the gaps that the authors have long since lost sight of.

And then, review and improve your BCP

You put a lot of time and sweat into your first version, as well as testing and communicating the BCP. Now this plan should not gather dust on a shelf in a closet or in a folder on a server. Then the plan becomes unusable, and even often untraceable when you need it.

Technology evolves, people come and go, processes change, so do personnel, both as customers and suppliers. So, keep the plan up-to-date. Therefore, it is best to review it periodically. A common rule of thumb is: annually. Discuss it with all parties involved.

What you should also do is consult with staff even before you review the BCP. Ask all departments in the organization to review the plan. Include outposts and VACs, even those abroad. After each test and after each crisis, prepare a lessons identified and lessons learned of what worked, and what didn’t.

How do you provide support for the BCP, and awareness?

What you should not do is do nothing à la “laissez faire, laissez passer.” The organization must support the BCP top down, with top management setting the example by endorsing its importance. So they must know about the plan, its contents and its revisions and the tests and their results. Top management cannot and should not delegate this endorsement of the plan to middle management, or subordinates.

Management is also important for the awareness of the man on the floor. After all, the latter does not remain insensitive to the wishes of his management. And if the staff does not know about a plan, how in God’s name could they have an appropriate response? You have to have someone from the top behind you to get things going. For that, you can do all sorts of things. A joint team building (at different levels) around BCM to use the plan, with some room for fantasy, can work very well. After all, nothing works as well as when people are having a good amount of fun with it. And the plan gains more credibility as a result.

Finally: topics for sandboxes.

Building outInfrastructure out (Facilities)Lack of technologyLack of key peopleLack of Supplier/ ManufacturerOther
Electricity breakdownWater supply breakdownTelecom failureEpidemic / PandemicManufacturer goes bankruptInternal fraud
FireHeating breakdownNetwork failureWorking from recovery location with minimal spaceSupplier undergoes own supply chain failureReputation issue
FloodingAir conditioning failureMalwareTerror attack NBC / Anthrax etc.Supplier technology network failsData theft
Bomb (notification)Fuel oil problemsSeizure of ICT systems by court after fraudBomb(notification)Terrorist attackCorruption of data by (ex) internal employee
Terrorist attackElectricity breakdown(Cyber)WarFood poisoning(Cyber)warEnvironmental disaster
Sealing locations by court after fraudWarMaintenance firms failEnvironmental disasterElectricity breakdown(Cyber)War
Sealing locations by court after attack/murderSuppliers failBuilding with ICT servers outTransport strike / problemsInternal fraud supplierMalware (Business failure)
WarMaintenance firms failElectricity breakdownWarEarthquakeElectricity breakdown (Business failure)
EarthquakeEarthquakeIT Supplier Fails/Bankrupt (HB)EarthquakeEuro fails (?)IT Supplier fails / goes out of business (Business failure)
Suspicious behaviorEarthquakeInternet failsSabotageMonster fine Europe etc.
VandalismInternet failsEuro failsBelgium cracks
BurglaryEuro fails (?)heat waveEuro fails
Sabotageheat waveSabotageAging
Suspicious object or packageSabotagePhysical aggressionMassive inflation
Hold up
HostageRelationship with the press
Tiger KidnappingTiger Kidnapping
Mad archerMad archer
SuicideSuicide
Child disappearanceChild disappearance

Manu Steens

Manu works at the Flemish Government in risk management and Business Continuity Management. On this website, he shares his own opinions regarding these and related fields.

Leave a Reply

Your email address will not be published. Required fields are marked *

Recent Posts