IT Self Service : Lessons from a teenage nightmare

February 3rd, 2015 by

Self Service : Lessons from a teenage nightmareWe all remember that first time we got the keys to the family car.  I can remember feeling excited and nervous, but over the course of a few months I figured I knew everything I needed in order to operate my dad’s brand new car.  That is until I wrecked it.

Twice.

In less than 24 hours.

Yes you read that last line correctly.  I took my dad’s car and hit a parked car and then rear ended a car in less than the span of an entire day.

I thought I knew how to operate the first new car my dad had ever bought.  I thought it was easy.  I was very wrong.

It seems that the more people I talk to about how they are leveraging the power of virtualization, their big initiative for 2015 seems to be focused around creating an IT self-service process that allows normal humans to have the same power as virtualization admins.  These mere mortals now have the power to create whatever environment they want.

This is like handing the keys of a Ferrari to a sixteen year old boy and sending him out to face the world.  You now have people creating workload with no understanding about ready queues, or memory allocations.  They are picking the size of the guest based on what they think they need, rather than what they really need.  This puts an incredible amount of pressure on the operations teams that are trying to manage this new normal.

The key to managing an environment comes does to understanding three key allocation decisions; sizing, placement, and capacity.  It is these three things that mean the difference between delivering a quality platform that others can build upon – or the operations team’s worst nightmare of war rooms and fire drills.

Sizing

I was talking to a customer and they explained to me that a big challenge to rolling out their self-service platform was that the developers were not really able to give them good specs in order to build out the service catalog.  They were not sure what OSes they wanted so they just said give us all of them.  Then it came to the question of how to size the templates and they did not know either so they requested every possible combination.  This sizing dilemma was slowing down the ability to deliver this service to the business.

When it comes to sizing most of the time it is just based on feel or other 3rd party specifications.  This can lead to either over-sized or undersized guests meaning you either waste resources or starve all the time.  This over/under problem is the result of not understanding the demand that needs to be satisfied and just basing it on what the person making the request thinks is needed.  This leads to putting a strain on both the capacity of the infrastructure as you have resources allocated that are not being used or the application suffering because it was not allocated the necessary resources to perform properly.

In order to make sure that the sizing is right the demand of the application needs to be the driving force defining the right size.  This means that a service catalog can be simplified to OS flavor and a much simpler spec design because the demand will drive the ultimate size of the virtual infrastructure.  This demand based sizing eliminates the need for a bloated service catalog with multiple capacity sizes.

Demand-driven sizing is a much more elegant way of sizing VMs, and ultimately helps down the road when you want to augment your internal environment with a hybrid cloud strategy.

Placement

The sizing of the virtual infrastructure is just one part of the equation.  Sizing impacts the placement of the workload.

I was at another customer where he saw 40 VMs spin up in one of his clusters and cause all kinds of alerts – but once he started to investigate they vanished.  What had happened is a user had submitted a build request and then cancelled it 5 minutes later.  However is those 5 minutes the various workloads of running applications were impacted because of the placement of this order.

Understanding proper placement is critical not only to make sure that the application environment that is being provisioned is going to be optimal, but you have to consider the workload that is already running.  You cannot impact the performance of either workload, both the new and the existing, so placement becomes a huge issue.

There are so many factors that need to be taken into consideration that this takes a lot of thought and planning.  Normally there is research done and placement is done safely to try and minimize the impact to an environment.  Once you implement self service that is no longer controlled.  These trade-off calculations have to be done on the fly at the time of fulfillment.  This is incredibly hard to do.

Capacity

Capacity is the hardest to control.  If you cannot control sizing and placement then this is where you pay for it, literally.  If you don’t have the right size for the workload, and have it properly placed, then you are either always at the edge of falling down or so over-provisioned that you are spending money needlessly to ensure you never get close to the edge.  This is a problem in an environment where every dollar is becoming harder to justify.

Physical assets need to be utilized to their fullest extent while still delivering a quality service.  If not, you have to answer to someone.  Either to the business for dodgy service, or to the finance department for sprawling hardware and software costs.  In this do-more-with-less world, such failure is not an option.

If your company chooses to leverage IT self service, all three – sizing, placement and capacity – have to be managed together from planning to deployment and ultimately to delivering a workable self-service platform.  It has to be rock solid, easy to manage and quick to scale.

Image credit: https://bs2u.wordpress.com/2012/03/03/a-car-crash-in-our-backyard/

Leave a Reply

Your email address will not be published. Required fields are marked *