In this post I will show how cloud computing can make modern disaster recovery planning affordable and actually capable of handling a major crisis. But first, let’s go back and look at a typical DR strategy of a non-multi billion dollar company who cannot afford the capital expense of fully redundant data centers.
Pre Cloud DR Scenario
Company XYZ has two data centers and has hired a company to pickup tape backups each day from both locations and store safely in an off-site location a few states away. Company XYZ does not have the capital to have an exact replica of their main data center so they had to carefully pick and choose a few core systems to try to replicate as close as possible. The end result is they have some systems that are fully redundant, but most systems are only replicated in an active-passive mode on lesser hardware. Company XYZ has determined that the less critical systems will have to run in a degraded state because the cost of a fully redundant active-active disaster recovery plan is just too expensive to justify. Company XYZ has a full time resource dedicated to business continuity and disaster recovery. This person has assembled a swat team consisting of both business and IT domain owners and set off to create a large document that defines the company’s strategy in the event of a disaster. They have documented a ton of processes that have to be implemented in the event of a disaster in order to have a prayer of getting the business up and running. There are huge logistical hurdles such as ….
- How do they get their people to the backup site in the event of a disaster?
- How fast can they get these passive systems up and active?
- How do they keep the documentation and the plan up to date as the systems change over time?
- How can they schedule and practice outages?
- How do they keep all the active and passive systems current on patches, firmware, parts, etc.
The list goes on. For most companies that don’t have 9 or 10 digit IT budgets, disaster recovery planning is an exercise of mitigating the risks in the event of a disaster as opposed to ensuring that the business can run uninterrupted. After all, the feasibility of having fully redundant and active data centers compared to the odds of a disaster make it hard to justify, at least that is what we thought before 911 and Katrina! Now we know better ;-} To sum it up, many DR plans are constrained by budget and simply try to mitigate risks based on the amount of capital available to them (or the amount that IT was able to beg for from the CFO). Most DR plans are a disaster waiting to happen!
Cloud DR Scenario
Company ABC is a 2 year old company who built their solution from scratch in the cloud. They have multiple virtual data centers that are running fully redundant active-active data centers. Company ABC has deployed redundant servers managed by redundant load balancers across multiple virtual data centers which are physically located in different data centers. Company ABC has an active-active DR strategy where all zones run identical hardware and software all the time. In fact, company ABC leverages all of their assets all the time, not just when an outage occurs.
|From Cloud Computing|
The image above shows how each layer of the architecture is fully redundant across multiple zones. If one zone failed, the other zones would pick up the diverted traffic and automatically scale up more nodes if needed to accommodate the spike in traffic. So let’s see how this approach addresses the issues listed above:
- How do they get their people to the backup site in the event of a disaster? They don’t need to. These are virtual data centers with virtual machines. Everything can be managed from a browser. The admins only need to get to a place that has Internet access. They don’t need to go to a physical data center.
- How fast can they get these passive systems up and active? They are already active and running 24×7.
- How do they keep the documentation and the plan up to date as the systems change over time? They still go through the same process of collecting application and business specific information. The difference is the actual plan and the execution of the DR strategy is much simpler and requires way less logistics.
- How can they schedule and practice outages? Simply shut down a zone and watch the other zones scale up.
- How do they keep all the active and passive systems current on patches, firmware, parts, etc. Apply patches to the standard image(s) and snapshot them. Take old image offline, create a new patched image from the standard image and deploy code on the newly patched image. Test and then put new image into the rotation behind the load balancers.
With the right architecture, your cloud based DR plan can actually can actually provide fully redundant virtual data centers that are active, not passive, and available 24×7 for a fraction of the costs. The need for purchasing or leasing multiple data centers and hiring the staff associated with the management and operations of the data centers are greatly reduced. In other words, DR can be cost effective and feasible in the cloud. At the end of the day, the money you spend can actually prevent a disaster instead of causing one.