This is part of a series of articles looking at real operational situations and how virtualization management solutions react.
You know the drill. You have to conduct routine infrastructure maintenance, like a hypervisor update. You start this process by performing live migration of guest workloads off of the update target to other hosts in the cluster, and making the target unavailable (maintenance mode).
So, what’s the impact on the in-place management solution in your virtual environment?
For tools with “learning” algorithms, like vCenter Operations Manager (VCOMS), all of the operating hosts would immediately experience abnormal resource utilization due to the additional guest loads. Smart alerts may be generated due to the abnormal conditions (and would be repeated as you rolled through the cluster updating each host). Depending on how long the update takes, you may set a new “normal” and then have an abnormal situation when you’ve completed maintenance, generating even more alerts. So, you either have to ignore the alerts or (as many resort to) you turn all alerts off during the maintenance period. Turning off alerts means your management system isn’t really adding value during maintenance downtime.
For tools with fixed thresholds, you have a 50/50 chance that the increased resource utilization is going to exceed the threshold (and if it does you will consequently generate a bunch of alerts). Again, you have to choose to either ignore or disable alerts. How do you optimize the performance of your cluster during this period? All of the long-term trends and graphs collected up until this point are meaningless because you have increased utilization on one or more hosts—unlike anything in the past. Also, once you’ve completed maintenance you have a resource increase in your history going forward. How do you ignore these data points in your 30-day trend analysis when you’re back to “normal?”
In contrast, using VMTurbo Operations Manager in a maintenance scenario responds in the same way as when not in maintenance mode. It will continually drive the environment to the “desired state”—one where workloads running in virtual machines get the resources they need to optimally perform while maximizing utilization of infrastructure resources. It recognizes the host in maintenance mode and immediately seeks to optimize the environment for the remaining hosts. Operations Manager automates decision-making regarding resource allocation and workload placement to assure performance of workloads/applications running in virtual machines during the maintenance period—providing a “To Do” list of actions to keep the environment in the “desired state.” The “To Do” list is continuously updated as you roll the updates through the cluster, assuring optimal performance during this reduced capacity period. And, when you come out of maintenance mode, Operations Manager updates the “To Do” list to optimize the environment for full capacity.
There’s no need to turn off or ignore alerts, no new “normal” learned, and no trial-and-error placement or configuration changes required to avoid a disruption of service. Operations Manager provides control before, during and after the maintenance period.
Sounds a lot less risky to me.