In our previous article, we looked at concept of guaranteeing Quality of Service. Is this even possible? As we discussed, the first thing we needed to do was to find values to measure that could indicate tangible QoS for different systems. This time let’s look closer at some challenges of the specific QoS attributes. We could start with a very simple scenario or maintaining performance of the applications running inside virtual machines.
Let’s look at the set of virtual machines sharing some hosts and storage. These virtual machines can carry various pieces of the workload performing specific application services. These services may have different requirements for service levels, for example, response time. We could split them into tiers corresponding to different service levels, e.g. Tier 1 would include VMQs with the response time below 10ms and Tier 2 with the response time below 100ms.
Now, we could start watching their performance and think of actions to maintain the quality of service. Let’s look at 2 VMs: VM1-Tier1 and VM2-Tier2 belonging to the corresponding tiers. Both VMs have identical configuration: 2 vCPU, 8GB vRAM, 1TB disk.
Using some traditional monitoring tools we observe the both VMs struggle with their virtual memory, VM1 – 85% vMEM and VM2 – 75%. vMEM. You know that they belong to different tiers and when you see that your Tier 1 VM is struggling with high vMem utilization your natural inclination would be to give it more resources to prevent QoS from suffering.
But let’s look closer at the possible actions. You could increase the vMEM size of VM1 by adding 4GB of vRAM potentially reducing the utilization to about 60%, a good number to accommodate load fluctuations. However, if we assume that all this memory is needed by the application workload it could increase the physical memory consumption from the underlying host by the same amount.
The underlying host memory is utilized by 70% and to avoid increasing the utilization higher the ESX scheduler will look for borrowing memory from other VMs via ballooning. It looks at VM2 and sees that half of the allocated pages haven’t been used, a perfect target, it borrows 2GB of RAM from it and gives it to VM1.
Let’s assume that all other resources are OK and you are about to report success, after all you just prevented the QoS degradation. But did you? If you start measuring the response time you may find out that VM1 response time is 5ms but VM2 response time is 200ms. So VM1 QoS is well within expected service levels and VM2 actually suffers a lot. And you just made the situation worse as VM2 now has even less vRAM.
Hmm, not exactly the expected result. What you probably didn’t realize that the VM1 app allocated enough buffers in its large Java heap to maintain good response time and the VM2 app uses heavy computational algorithms, doesn’t use the heap often and it was garbage collected but when time comes it may need more memory which is not there and thus its response time will suffer significantly.
Then a better approach would be to give more resources to the application which struggles with its service levels than the one which has the higher priority but doesn’t struggle. After all, our goal is to adhere to the QoS for ALL supported tiers and allocate the resources proportional to the expected service levels and not just blindly using the pre-allocated priorities.
So this elusive QoS service handling is becoming more refined, however, it is still not clear how to do that. At least the priority scheme is simple enough, you could allocate resources proportional to the priority without any actual guarantee. The approach we outlined above, being driven by the actual QoS adherence gets us closer to the goal but it poses a new challenge: how much exactly we need to allocate to the struggling app to reach the QoS goal?
We will be looking at this and other related aspects in the next several posts, please stay with us at the yellow brick road to QoS.
Image source: http://en.wikipedia.org/wiki/File:Cowardly_lion2.jpg