When Disaster Strikes: How Robust is Your DR Capacity Planning & Testing?
Folks are calling BS on their DR capacity planning and test process. You know what I’m saying. You spent money on a DR project…but will it actually work? And not the weekend “test.” Will it actually work in the middle of a work week under normal workloads and transaction volumes?
In a nutshell, the fear is:
- How do I know with certainty that I have the capacity to failover the workloads necessary to sustain business productivity and continuity?
- How do I minimize costs while safely maximizing utilization of the DR infrastructure I chose to invest in?
And the concern that ties it all together:
- How do I manage everything in the event of a disaster?
Let’s address these in order. Evaluating capacity is a game of best-guesses, and the uncertainty is amplified when considering compounding factors such as only failing over mission-critical workloads, re-purposing older heterogeneous hardware, and ensuring that whatever hardware one chooses is just-enough-but-not-too-much.
What we’re realizing is that these topics are inherently related. Enterprises want a functional DR policy, but at the right cost and management-complexity tradeoff.
Unless you’re best buds with your CFO, we bet your DR infrastructure will have higher levels of utilization and VM-to-host density than your primary site. And an effective DR strategy can ensure a very efficient infrastructure—somewhere in the neighborhood of 80-85% utilization; yet planning (and managing) to this target is exceedingly difficult.
Most enterprises we talk with are using spreadsheets (and dartboards) to capacity plan for their DR scenarios. The problem: those spreadsheets don’t help you manage those workloads after disaster strikes. They don’t model compliance constraints like affinity and anti-affinity rules, real-world storage and compute performance issues like latency, swapping, or queuing.
Imagine if airlines treated Thanksgiving-eve like any other Wednesday evening. Now imagine that stress, panic, and anxiety in the office while the business is hemorrhaging money because it planned DR without knowing exactly how the workloads would behave on the DR hardware under peak load. (And you know who gets the blame for that.)
Short of pulling the plug on your primary site on your business’s Black Friday (and crossing your fingers), the best way to test your DR scenarios is to simulate your secondary datacenter with actual workload signatures. DR capacity planning must be done with an understanding of how the workloads will behave in real time. After all, how will an enterprise plan for and sustain high utilization in a DR event?
Furthermore, in such an event (a true disaster), does one really want to be concerned with what workloads are running where? Or should such tedious workload placement decisions be handled automatically by software—the same software that quantitatively determined what capacity is necessary for it to later manage and maintain the entire environment at a high-but-safe level of utilization in complete automation.
We encourage our customers to use VMTurbo to plan their DR scenarios and verify their DR policy using our datacenter simulation features. The impact: zero guesswork when planning for DR, zero stress when implementing and executing a DR strategy, and zero QoS impact when DR goes into effect.
Or you can keep crossing your fingers.