Will auto-tiering solve your virtualization challenges?

October 3rd, 2014 by

Automated storage tiering allows virtualized environments to rapidly adapt to fluctuations in VM storage activity.  The ability to move data among storage drives depending on speed and space requirements of your data can help with accessibility.  Yet, while auto-tiering solves accessibility at the array level, it cannot assure performance nor efficiency at the hypervisor layer.   Managing datastore congestion and availability remains unsolved.  So how can you maximize your company’s investment into your ATS environment?

On a recent discussion with a law firm last week, the director of IT highlighted the recent implementation of an auto-tiering system from Compellent Technologies (Dell).  He created different classes of service based off the underlying drives.  Allowing for high IOPS consumers to sit on faster drives while historical documents, notices, and smaller VMs could be placed on their extensive SATA environment.  Despite such advancements in the array layer, they had difficulty managing their database’s high usage/thin provision risk alongside their less “hungry” legal documentation files/VMs.  As was the case, I explained how VMTurbo can perfectly compliment their storage architecture.

In order to solve thin provision risk on the datastores in an auto-tiered environment, our system views datastores as both a consumer and provider of resources.  Let’s look at managing thin provision risk using this resource relationship model.

The customer thin provisioned many VMs and datastores to drive higher densities across his storage units. The problem is, the more thin provisioning you do on your datastores the greater the risk becomes of running out of space if the workloads starts demanding more storage.  At what level of risk do you come into your storage environment and start taking actions to guarantee that what you have provisioned does not cause space congestion?

Managing the tradeoffs between efficiency and performance in your storage environment becomes very complex.  Our control system delivers automated service assurance through storage placement decisions, reconfiguring vStorage partitions or the datastore itself in order to eliminate the tradeoffs between provisioned risk and efficiency.  In the customer case above, rather than moving the “bully vms” (databases) around and causing contention elsewhere, our system recommended a few SvMotions of smaller applications to other datastores.

It’s not always as simple as moving the high consumers of space among the datastores.  The decision process goes back to supply and demand.  By moving a bully VM around, IT operations significantly increases the space demands on destination datastores.  If we cannot match the available supply of space with the added demands of the databases then we are making tradeoffs.  More specifically, the tradeoff of trying to drive lower thin provision risk on one datastore for space constraints on another.  VMTurbo’s real-time analysis identified the right VMs to move around so that the underlying supply of all resources (space, IOPS, latency, etc.) correlates with the resource demands of the customer’s VMs.  The impact of driving a storage environment to this economic equilibrium effectively eliminates the risk involved in thin provisioning and assures that we do not cause IOPS bottlenecks or latency constraints.

I urged the customer to take the recommended actions to verify that the outcome actually drives out the thin-provisioned risks within the environment.  Upon taking our actions, our solution lowered the levels of thin-provisioned space without impacting Compellent’s ability to auto-distribute utilization across the drives.

auto-tier 1

As a result, the customer watched his storage environment improve right before his eyes.  Provisioned storage utilization improved from a range of 85% to 90% to less than 45% across the three drives. The crucial ingredient in these decisions is defining the desired state of the environment before you act.  In other words, defining the equilibrium between supply and demand in your IT marketplace.  Based off of usage, capacity, and demand our software intelligently defines an optimal operating zone for the datastore environment where resource allocation is improved, a higher VM/datastore density is achieved, and VMs are not battling for storage resources.  After all, if we don’t know where we are going then how can we get there?

The intelligence behind this desired state allows the system to drive out thin-provision risk and make sure that prescriptive actions are taken to maintain the entire environment in the desired state.

Imagine trying to interpret which (of the 100’s) VMs to move and where to move it too, all without knowing what other performance impacts or inefficiencies it might cause.

There are simply too many datapoints that are constantly changing in real-time to evaluate which VMs to move or size to eliminate thin-provision risk.  Adapting to the swings in provisioned space on the datastores becomes difficult to manage in real-time.  Rapid data fluctuations occur as a result of daily operations.  Every time a snapshot is opened it uses extra disk space, when you boot up virtual machines swap space is reserved on disk equal to the amount of RAM allocated to the VM, etc.  In fact, a datastore that has little provisioned on it can rapidly skyrocket if a heavy 50 GB database is migrated to it.  Isolating the provisioned risks is tricky, but interpreting the decision to drive out any provisioned risk and preventing it from occurring again through on-going sizing placement and capacity decisions is even harder.

Overall, VMTurbo maintains the datastores in a healthy yet efficient state, and the auto-tiered structure kept data on the optimum drives for faster accessibility.  Yet there are still four other layers of contention beyond the drives (aggregate/pool, datastore, and VM).  Even if you have auto-tiering in your arrays, how do you plan on controlling your storage environment from the aggregate layer and up?


Triangle: Performance

This article is about performance. Read more like it at the Performance, Efficiency, Agility series.

Leave a Reply

Your email address will not be published. Required fields are marked *