Storage Control Beyond the Datastore

August 21st, 2014 by

The storage layer of virtual environments is truly the foundation of virtualization.  Any persistent or instantaneous problems occurring within the storage units often causes massive performance impacts and inefficiencies all the way up the stack to your applications.  Managing your storage is one of the most difficult tasks involved in IT operations due to how complex and configurable a storage environment can be.

To put this into perspective, imagine a fairly small environment with 5 datastores.  If each datastore is responsible for 25 virtual machines, simply keeping track of how much availability there exists for all 100 virtual machines is challenging, let alone managing it.  The problem is exponentially intensified when you consider other factors such as usage, IOPS, thin provision risk, latency, etc.  With every control point and metric that can be accessed for each datastore, managing the performance and efficiencies at the datastore level is nearly impossible to do in real time.

I was talking to someone yesterday involved in datacenter operations for a large insurance company who told me that his SQL and ECM servers were causing massive amounts of latency on nearly every datastore their images were located on.  Worst of all, the SQL servers were tied to his five most critical business applications.  His attempt to triage this issue was to wait for latency to rise above a threshold of 40 milliseconds then move the highest consumers of IOPS to another datastore with more availability; which only caused latency elsewhere.

Before we installed the control module, there were numerous recommendations to move Virtual Machines among datastores, resize VStorage capacities, and a couple recommendations to resize some datastores.  We took nearly every storage motion to move the VM’s images around and it solved a large portion of the IOPS congestion and latency issues.  In fact, within his main storage cluster VMTurbo’s placement decisions alone reduced the overall latency by approximately 500ms.  A thorough understanding of the underlying capacity of every datastore and virtual machine, allowed VMTurbo’s decision engine to place and size workload across his infrastructure to ensure that every VM got the resources they needed from the datastores to support the demand of the applications.

While VMTurbo was able to drastically reduce latency issues across his datastores, a few of the heavy hitters such as the SQL and ECM servers still had higher than acceptable levels of latency.  After reviewing the actions taken it became apparent that some of the recommendations moved his SQL servers across datastores on the same underlying aggregate.  I explained that the current level of control VMTurbo has will not go deeper than the datastores; thus, in order to successfully triage (and more importantly prevent) issues such as latency, it is crucial to understand every point of contention, every resource, and every VM’s storage demands beyond the datastores.

I suggested installing the storage control module to provide vision into his NetApp arrays and filers.

stroage ds

Within 15 minutes of setup and initial discovery, every placement action was now moving his SQL, ECM, and other high demanders of IOPS among datastores that consumed from different RAID groups.  We experimented by taking these new placement decisions and to no surprise, the module’s control and depth solved contention and latency at the disk array level.  The outcome was that every single SQL server and ECM server experienced latency no higher than 6ms.  Within one week we saw improved storage resilience to varying workload demands, maintaining his storage environment in a perpetual state of health.  His IT staff even took a couple sizing actions to increase the VStorage capacity in their development cluster, and were pleasantly surprised that it had zero impact to the health of their storage units.

In summary, VMTurbo Operations Manager utilizes a common data model that understands all the interdependencies within your infrastructure and delivers software driven decisions in real time to ensure QoS while driving the highest possible utilization across your storage platform.  An awareness of de-duplication, compression, IOPs consumption, latency, and thin-provisioning at both the array level and the virtual layer allows the software to identify the best action to take that will have the greatest impact and outcome across your IT stack.  When the storage environment is healthy, VMTurbo provides preventative actions for space optimization across the arrays, snapshot management, storage controller performance management, and identification of IO/Latency constraints to ensure threats do not evolve into issues nor alerts.

Triangle: Performance

This article is about performance. Read more like it at the [Performance, Efficiency, Agility] series.

Leave a Reply

Your email address will not be published. Required fields are marked *