In looking at some of reasons for virtualization’s popularity, the preceding sections identified the concept of contention, the capability to better use previously underutilized physical resources in a server in order to reduce the total number of physical servers deployed. For the purposes of this discussion, we can split the idea of contention into two parts: good contention and bad contention.
Good contention
Good contention is straightforward: It enables you to see positive benefits from virtualizing your servers, ultimately resulting in less time and money spent on deploying and maintaining your physical server estate.
For example, if the average CPU utilization of 6 single CPU physical servers was 10% and none of them had concurrent peak CPU usage periods, then I would feel comfortable virtualizing those 6 servers and running them as a single server with a single CPU — the logic being 6× 10% = 60%, and therefore less than the capacity of a single server with a single CPU. I’d want to make sure there was sufficient physical memory and storage system performance available for all 6 virtual servers, but ultimately the benefit would be the ability to retire 5 physical servers.
That’s a very simple example but one that most businesses can readily understand. CPU utilization is an absolute number that is usually a good reflection of how busy the server is. Conversely, sizing the server’s memory is something to which you can’t apply such an easy consolidation methodology to. Instead, you usually need to determine the total memory requirement of all the virtual servers you want to run on a host server and then ensure you have more than that amount of physical memory in the host. However, VMware’s hypervisor complicates that by offering a memory de-duplication feature that allows duplicate memory pages to be replaced with a link to a single memory page shared by several virtual servers, but over-estimating the benefit this technology could deliver wrong can result in the performance issues you tried to avoid. For SQL Server environments that are dependent on access to large amounts of physical memory, trusting these hypervisor memory consolidation technologies still requires testing, so their use in sizing exercises should be minimized.
Bad contention
Not all contention is good. In fact, unless you plan well you’re more likely to have bad contention than good contention. To understand bad contention, consider the CPU utilization example from the preceding section: 6 servers with average CPU utilization values of 10% being consolidated onto a single CPU host server. This resulted in an average CPU utilization for the host server of around 60%. Now imagine if the average CPU utilization for two of the virtual servers jumps from 10% to 40%. As a consequence, the total CPU requirement has increased from 60% to 120%. Obviously, the total CPU utilization cannot be 120%, so you have a problem. Fortunately, resolving this scenario is one of the core functions of hypervisor software: How can it look like CPU utilization is 120%, for example, when actually only 100% is available?
Where does the missing resource come from? Behaviors such as resource sharing, scheduling, and time-slicing are used by hypervisors to make each virtual server appear to have full access to the physical resources that it’s allocated all of the time. Under the hood, however, the hypervisor is busy managing resource request queues — for example, “pausing” virtual servers until they get the CPU time they need, or pre-empting a number of requests on physical cores while the hypervisor waits for another resource they need to become available.
How much this contention affects the performance of virtual servers depends on how the hypervisor you’re using works. In a worst-case scenario using VMware, a virtual server with a large number of virtual CPUs can be significantly affected if running alongside a number of virtual servers with small numbers of virtual CPUs; this is due to VMware’s use of their co-scheduling algorithm to handle CPU scheduling. Seeing multi-second pauses of the larger virtual server while it waits for sufficient physical CPU resources is possible in the worst-case scenarios, indicating not only the level of attention that should be paid to deploying virtual servers, but also the type of knowledge you should have if you’re going to be using heavily utilized virtual environments.
Although that example of how VMware can affect performance is an extreme example, it does show how bad contention introduces unpredictable latency. Previously, on a host server with uncontended resources, you could effectively assume that any virtual server’s request for a resource could be fulfilled immediately as the required amounts of resource were always available. However, when the hypervisor has to manage contention, a time penalty for getting access to the resource gets introduced. In effect, “direct” access to the physical resource by the virtual server can no longer be assumed.
“Direct” is in quotes because although virtual servers never directly allocate to themselves the physical resources they use in an uncontended situation, the hypervisor does not have difficulty finding the requested CPU time and memory resources they require; the DBA can know that any performance penalty caused by virtualization is likely to be small but, most important, consistent. In a contended environment, however, the resource requirements of other virtual servers now have the ability to affect the performance of other virtual servers, and that becomes un-predictable.
Leave a Reply