Mastering VMware® Infrastructure3 - читать бесплатно онлайн полную версию книги автора Chris McCain (Chapter 9 Managing and Monitoring Resource Access) #11

Chapter 9 Managing and Monitoring Resource Access

The idea that we can take a single physical server and host many virtual machines has a great deal of value in today's dynamic datacenter environments, but let's face it — there are limits to how many virtual machines can be hosted on an ESX Server platform. The key to making the most of your virtualization platform is understanding how resources are consumed by the virtual machines running on the host and how the host itself consumes resources. Then, there's the issue of how we, the administrators, can exercise control over the way a virtual machine or group of virtual machines uses resources.

The key resources are memory, processors, disks, and networks. When a number of virtual machines are hosted on an ESX host, each virtual machine consumes some of these resources; however, the method the ESX Server uses to arbitrate access to each resource is a bit different. This chapter will discuss how the ESX Server allocates these resources, how you can change the way these resources are allocated, and how you can monitor the consumption of these resources over time.

In this chapter you will learn to:

♦ Manage virtual machine memory

♦ Manage virtual machine CPU allocation

♦ Create and manage resource pools

♦ Configure and execute VMotion

♦ Create and manage clusters

♦ Configure and manage Distributed Resource Scheduling (DRS)

Allocating Virtual Machine Memory

One of the most significant advantages of server virtualization is the ability to allocate resources to a virtual machine based on the machine's actual performance needs. In the traditional physical server environment, a server is often provided with more resources than it really needs because it was purchased with a specific budget in mind and the server specifications were maximized for the budget provided. For example, does a DHCP server really need dual processors, 4GB of RAM, and 146GB mirrored hard drives? In most situations, the DHCP server will most certainly under-utilize those resources. In the virtual world, we can create a virtual machine better suited for the DHCP services provided by the virtual machine. For this DHCP server, then, we would assemble a virtual machine with a more suitable 1GB of RAM, access to a single CPU, and 20GB of disk space, all of which are provided by the ESX host that the virtual machine is running on. Then, we can create additional virtual machines with the resources they need to operate effectively without wasting valuable memory, CPU cycles, and disk storage. As we add more virtual machines, each machine places additional demand on the ESX Server, and the host's resources are consumed to support the virtual machines. At a certain point, either the host will run out of resources or we will need to find an alternate way to share access to a limited resource.

The Game Plan for Growth

One of the most challenging aspects of managing a virtual infrastructure is managing growth without jeopardizing performance and without overestimating. From small business to large enterprise, it is critical to establish a plan for managing virtual machine and ESX Server growth.

The easiest approach is to construct a resource consumption document that details the following:

♦ What is the standard configuration for a new virtual machine to be added to the inventory? What is the size of the operating system drive? What is the size of the data drive? How much RAM will it be allocated?

♦ What are the decision points for creating a virtual machine with specifications beyond the standard configuration?

♦ How much of a server's resources can be consumed before availability and performance levels are jeopardized?

♦ At the point where the resources for an ESX Server (or an entire cluster) are consumed, do we add a single host or multiple hosts at one time?

♦ What is the maximum size of a cluster for our environment? When does adding another host (or set of hosts) constitute building a new cluster?

Let's start with how memory is allocated to a virtual machine. Then, we'll discuss the mechanisms ESX will use to arbitrate access to the memory under contention and what you as administrator can do to change how virtual machines access memory.

When you create a new virtual machine through the VI Client, the wizard will ask you how much memory the virtual machine should have, as shown in Figure 9.1.

The amount of memory you allocate on this screen is the amount the guest operating system will see — in this example, it is 1024MB. This is the same as when you build a system and put two 512MB memory sticks into the system board. If we install Windows 2003 in this virtual machine, Windows will report 1024MB of RAM installed. Ultimately this is the amount of memory the virtual machine "thinks" that it has.

Let's assume we have an ESX Server with 4GB of physical RAM available to run virtual machines (in other words, the Service Console and VMkernel are using some RAM and there's 4GB left over for the virtual machines). In the case of our new virtual machine, it will comfortably run, leaving approximately 3GB for other virtual machines (there is some additional overhead that we will discuss later, but for now let's assume that the 3GB is available to other virtual machines).

What happens when we run three more virtual machines each configured with 1GB of RAM? Each of the additional virtual machines will request 1GB of RAM from the ESX host. At this point, four virtual machines will be accessing the physical memory.

What happens when you launch a fifth virtual machine? Will it run? The short answer is yes, but the key to understanding why this is so is the mechanism that ESX Server employs — and it is based on some default settings in the virtual machines' configuration that administrators have control over.

Figure 9.1 Initial Memory settings for a virtual machine indicate the amount of RAM the virtual machine "thinks" that it has.

In the advanced settings for a virtual machine, as shown in Figure 9.2, we can see there is a setting for a reservation, a limit, and shares. In this discussion, we will examine the limit and reservation settings and then come back later to deal with the shares.

Figure 9.2 Each virtual machine can be configured with a shares value, a reservation, and a limit.

To edit the reservation, limit, or shares of a virtual machine:

1. Use the VI Client to connect to a VirtualCenter Server or directly to an ESX Server host.

2. Drill down through the inventory to find the virtual machine to be edited.

3. Right-click the virtual machine and select the Edit Settings option.

4. Click the Resources tab.

5. On the Resources tab, select the CPU or Memory options from the Settings list on the left.

6. Adjust the Shares, Reservation, and Limit values as desired.

The following sections will detail the ramifications of setting custom Reservation, Limit, and Shares values.

Memory Reservation

The Reservation value is an optional setting you can set for each virtual machine — but note that the default value is 0MB, as this has a potential impact on virtual machine performance. The Reservation amount specified on the Resource tab of the virtual machine settings is the amount of actual, real physical memory that the ESX Server must provide to this virtual machine for the virtual machine to power on. A virtual machine with a reservation is guaranteed the amount of RAM configured in its Reservation setting. In our previous example, the virtual machine configured with 1GB of RAM and the default reservation of 0MB means the ESX Server does not have to provide the virtual machine with any physical memory. If the ESX Server is not required to provide actual RAM to the virtual machine, then where will the virtual machine get its memory? The answer is that it provides swap, or more specifically something called VMkernel swap.

VMkernel swap is a file created when a virtual machine is powered on with a .vswp extension. The per-virtual machine swap files created by the VMkernel reside by default in the same datastore location as the virtual machine's configuration file and virtual disk files. By default, this file will be equal to the size of the RAM that you configured the virtual machine with, and you will find it in the same folder where the rest of the virtual machines files are stored, as shown in Figure 9.3.

Figure 9.3 The VMkernel creates a per-virtual machine swap file stored in the same datastore as the other virtual machine files. The swap file has a .vswp extension.

In theory, this means a virtual machine can get its memory allocation entirely from VMkernel swap — or disk — resulting in virtual machine performance degradation. If the virtual machine is configured with a reservation or a limit, the VMkernel swap file could differ.

The Speed of RAM

How slow is VMkernel swap when compared to RAM? If we make some basic assumptions regarding RAM access times and disk seek times, we can see that both appear fairly fast in terms of a human but that in relation to each other RAM is faster:

RAM access time = 10 nanoseconds (for example) Disk seek time = 8 milliseconds (for example) The difference between these is calculated as follows:

0.008 ÷ 0.000000010 = 800,000

RAM is accessed 800,000 times faster than disk. Or to put it another way, if RAM takes 1 second to access, then disk would take 800,000 seconds to access — or nine and a quarter days — ((800,000 ÷ 60 seconds) ÷ 60 minutes) ÷ 24 hours).

As you can see, if virtual machine performance is your goal, it is prudent to spend your money on enough RAM to support the virtual machines you plan to run. There are other factors, but this is a very significant one.

Does this mean that a virtual machine will get all of its memory from swap when ESX Server RAM is available? No. What this means is that if an ESX host doesn't have enough RAM available to provide all of the virtual machines currently running on the host with their memory allocation, the VMkernel will page some of each virtual machine's memory out to the individual virtual machine's VMkernel swap file (VSWP), as shown in Figure 9.4.

Figure 9.4 Memory allocation for a virtual machine with 1024MB of memory configured and no reservation

How do we control how much of an individual virtual machine's memory allocation can be provided by swap and how much must be provided by real physical RAM? This is where a memory reservation comes into play.

Let's look at what happens if we decide to set a memory reservation of 512MB for this virtual machine, shown in Figure 9.5. How does this change the way this virtual machine gets memory?

Figure 9.5 A virtual machine configured with a memory reservation of 512MB

In this example, when this virtual machine is started, the host must provide at least 512MB of real RAM to support this virtual machine's memory allocation, and the host could provide the remaining 512MB of RAM from VMkernel swap. See Figure 9.6.

This ensures that a virtual machine has at least some high-speed memory available to it if the ESX host is running more virtual machines than it has actual RAM to support, but there's also a downside. If we assume that each of the virtual machines we start on this host have a 512MB reservation and we have 4GB of available RAM in the host to run virtual machines, then we will only be able to launch eight virtual machines concurrently (8×512MB=4096MB). On a more positive note, if each virtual machine is configured with an initial RAM allocation of 1024MB, then we're now running virtual machines that would need 8GB of RAM on a host with only 4GB.

Figure 9.6 Memory allocation for a virtual machine with 1024MB of memory configured and a 512MB reservation

Memory Limit

If we look back at Figure 9.5, you will also see a setting for a memory limit. By default, all new virtual machines are created without a limit, which means that the initial RAM you assigned to it in the wizard is its effective limit. So, what exactly is the purpose of the limit setting? The limit sets the actual limit. A virtual machine cannot be allocated more physical memory than is configured in the limit setting.

Let's now change the limit on this virtual machine from the unlimited default setting to 768MB, as shown in Figure 9.7.

Figure 9.7 A virtual machine configured with 1024MB of memory, a 512MB reservation, and a 768MB limit

This means that the top 256MB of RAM will always be provided by swap, as shown in Figure 9.8.

Figure 9.8 Memory allocation for a virtual machine with 1024MB of memory configured, a 512MB reservation, and a 768MB limit

Think about the server administrator who wants a new virtual machine with 16GB of RAM. You know his application doesn't need it, but you can't talk him out of his request — and worse than that, your supervisor has decided you need to build a virtual machine that actually has 16GB of RAM. Consider creating the virtual machine with an initial allocation of 16GB, and set a reservation of 1GB and a limit of 2GB. The operating system installed in the virtual machine will report 16GB of RAM (making that person happy and keeping your supervisor happy, too). The virtual machine will always consume 1GB of host memory. If your host has available RAM, then the virtual machine might consume up to 2GB of real physical memory with the top 14GB always being provided by VMkernel swap. Of course, the virtual machine performance should not be expected to perform as if it had all 16GB of memory from physical RAM.

Working together, an initial allocation of memory, a memory reservation, and a memory limit can be powerful tools in efficiently managing the memory available on an ESX Server host.

Memory Shares

In Figure 9.5, there was a third setting called Shares that we have not discussed. The share system in VMware is a proportional share system that provides administrators with a means of assigning resource priority to virtual machines. For example, with memory settings, shares are a way of establishing a priority setting for a virtual machine requesting memory that is above the virtual machine's reservation but below its limit. In other words, if two virtual machines want more memory than their reservation limit, and the ESX host can't satisfy both of them using RAM, we can set share values on each virtual machine so that one gets higher-priority access to the RAM in the ESX host than the other. Some would say that you should just increase the reservation for that virtual machine. While that may be a valid technique, it may limit the total number of virtual machines that a host can run, as indicated earlier in this chapter. Increasing the limit also requires a reboot of the virtual machine to become effective, but shares can be dynamically adjusted while the virtual machine remains powered on.

For the sake of this discussion, let's assume we have two virtual machines (VM1 and VM2) each with a 512MB Reservation and a 1024MB Limit, and both running on an ESX host with less than 2GB of RAM available to the virtual machines. If the two virtual machines in question have an equal number of shares (let's assume it's 1000 each), then as each virtual machine requests memory above its reservation value, each virtual machine will receive an equal quantity of RAM from the ESX host and, because the host cannot supply all of the RAM to both virtual machines, each virtual machine will swap equally to disk (VMkernel pagefile VSWP).

If we change VM1's Shares setting to 2000, then VM1 now has twice the shares VM2 has assigned to it. This also means that when VM1 and VM2 are requesting the RAM above their respective reservation values, VM1 will swap one page to VMkernel pagefile for every two pages that VM2 swaps. Stated another way, VM1 gets two RAM pages for every one RAM page that VM2 gets. If VM1 has more shares, VM1 has a higher-priority access to available memory in the host.

It gets more difficult to predict the actual memory utilization and the amount of access each virtual machine gets as more virtual machines run on the same host. Later in this chapter we will discuss more sophisticated methods of assigning memory limits, reservations, and shares to a group of virtual machines using resource pools.

Allocating Virtual Machine CPU

When creating a new virtual machine using the VI Client, the only question you are asked related to CPU is, “Number of virtual processors?”, as shown in Figure 9.9.

Figure 9.9 When a virtual machine is created, the wizard provides the opportunity to configure the virtual machine with one, two, or four virtual CPUs.

The CPU setting effectively lets the guest operating system utilize one, two, or four virtual CPUs on the host system. When the VMware engineers designed the virtualization platform, they started with a real system board and modeled the virtual machine after it — in this case, it was based on the Intel 440BX chipset. The PCI bus was something the virtual machine could emulate, and could be mapped to input/output devices through a standard interface, but how could a virtual machine emulate a CPU? The answer was “no emulation.” Think about a virtual system board that has a “hole” where the CPU socket goes — and the guest operating system simply looks through the hole and sees one of the cores in the host server. This allowed the VMware engineers to avoid writing CPU emulation software that would need to change each time the CPU vendors introduced new instruction sets. If there was an emulation layer, it would also add a significant quantity of overhead, which would limit the performance of the virtualization platform by adding more computational overhead.

So how many CPUs should a virtual machine have? Creating a virtual machine to replace a physical DHCP server that runs at less than 10 percent CPU utilization at its busiest point in the day surely does not need more than one virtual CPU. As a matter of fact, if we give this virtual machine two virtual CPUs (vCPUs), then we would effectively limit the scalability of the entire host.

The VMkernel simultaneously schedules CPU cycles for multi-CPU virtual machines. This means that when a dual-CPU virtual machine places a request for CPU cycles, the request goes into a queue for the host to process, and the host has to wait until there are at least two cores with concurrent idle cycles to schedule that virtual machine. This occurs even if the virtual machine only needs a few clock cycles to do some menial task that could be done with a single processor. Think about it this way: You need to cash a check at the bank, but because of the type of account you have, you need to wait in line until two bank tellers are available at the same time. Normally, one teller could handle your request and you would be on your way — but now you have to wait. What about the folks behind you in the queue as you wait for two tellers? They are also waiting longer because of you.

On the other hand, if a virtual machine needs two CPUs because of the load it will be processing on a constant basis, then it makes sense to assign two CPUs to that virtual machine — but only if the host has four or more CPU cores total. If your ESX host is an older generation dual-processor single-core system, then assigning a virtual machine two vCPUs will mean that the virtual machine owns all of the CPU processing power on that host every time it gets CPU cycles. You will find that the overall performance of the host and any other virtual machines will be less than stellar.

One (CPU) for All… at Least to Begin With

Every virtual machine should be created with only a single virtual CPU so as not to create unnecessary contention for physical processor time. Only when a virtual machine's performance level dictates the need for an additional CPU should one be allocated. Remember that multi-CPU virtual machines should only be created on ESX Server hosts that have more cores than the number of virtual CPUs being assigned to the virtual machine. A dual-CPU virtual machine should only be created on a host with two or more cores, and a quad-CPU virtual machine should only be created on a host with four or more cores.

Default CPU Allocation

Like the memory settings discussed, the settings Shares, Reservation, and Limit can be configured for CPU. Figure 9.10 shows the default values for CPU Resource settings.

Figure 9.10 A virtual machine's CPU can be configured with Shares, Reservation, and Limit values.

When a new virtual machine is created with a single vCPU, the total maximum CPU cycles for that virtual machine equals the clock speed of the host system's core. In other words, if you create a new virtual machine, it can see through the “hole in the system board” and it sees whatever the core is in terms of clock cycles per second — an ESX host with 3GHz CPUs in it will allow the virtual machine to see one 3GHz core.

CPU Reservation

As shown in Figure 9.10, the default CPU reservation for a new virtual machine starts at 0MHz. Therefore, by default a virtual machine is not guaranteed any CPU activity by the VMkernel. This means that when the virtual machine has work to be done, it places its CPU request into the CPU queue so that the VMkernel can handle the request in sequence along with all of the other virtual machines' requests. On a lightly loaded ESX host, it's unlikely the virtual machine will wait long for CPU time; however, on a heavily loaded host, the time this virtual machine may have to wait could be significant.

If we were to set a 300MHz reservation, as shown in Figure 9.11, this would effectively make that amount of CPU available instantly to this virtual machine if there is a need for CPU cycles.

Figure 9.11 A virtual machine configured with a 300 MHz reservation for CPU activity

This option also has another effect similar to that of setting a memory reservation. If each virtual machine you create has a 300 MHz reservation and your host has 6000 MHz of CPU capabilities, you can deploy no more than 20 virtual machines even if all of them are idle. The host system must be able to satisfy all of the reservation values concurrently. Now, does that mean each virtual machine is limited to its 300 MHz? Absolutely not — that's the good news. If VM1 is idle and VM2 needs more than its CPU reservation, the ESX host will schedule more clock cycles to VM2. If VM1 suddenly needs cycles, VM2 doesn't get them any more and they are assigned to VM1.

CPU Limit

Every virtual machine has an option that you can set to place a limit on the amount of CPU allocated. This will effectively limit the virtual machine's ability to see a maximum number of clock cycles per second, regardless of what the host has available. Keep in mind that a single virtual CPU virtual machine hosted on a 3GHz, quad-processor ESX Server will only see a single 3GHz core as its maximum, but as administrator you could alter the limit to hide the actual maximum core speed from the virtual machine. For instance, you could set a 500 MHz limit on that DHCP server so that when it reindexes the DHCP database, it won't try to take all of the 3GHz on the processor that it can see. The CPU limit provides you with the ability to show the virtual machine less processing power than is available on a core on the physical host. Not every virtual machine needs to have access to the entire processing capability of the physical processor core.

Real World Scenario

Increasing Contention in the Face of Growth

One of the most common problems administrators can encounter occurs when several virtual machines without limits are deployed on a new virtualized environment. The users get accustomed to stellar performance levels early in the environment lifecycle, but as more virtual machines are deployed and start to compete for CPU cycles, the relative performance of the first virtual machines deployed will degrade. One approach to this issue is to set a reservation of approximately 10 to 20 percent of a single core's clock rate as a reservation and add approximately 20 percent to that value for a limit on the virtual machine. For example, with 3GHz CPUs in the host, each virtual machine would start with a 300 MHz reservation and a 350 MHz limit. This would ensure that the virtual machine would perform similarly on a lightly loaded ESX host, as it will when that host becomes more heavily loaded. Consider setting these values on the virtual machine that you use to create a template since these values will pass to any new virtual machines that were deployed from that template. Please note that this is only a starting point. It is possible to limit a virtual machine that really does need more CPU capabilities, and you should always actively monitor the virtual machines to determine if they are using all of the CPU you are providing them with.

If the numbers seem low, feel free to increase them as needed. More important is the concept of setting expectations for virtual machine performance.

CPU Shares

In a manner similar to memory allocation, you can assign CPU share values to a virtual machine. The shares for CPU will determine how much CPU is provided to a virtual machine in the face of contention with other virtual machines needing CPU activity. All virtual machines, by default, start with an equal number of shares, which means that if there is competition for CPU cycles on an ESX host, each virtual machine gets serviced with equal priority. Keep in mind that this share value only affects CPU cycles that are above the reservation set for the virtual machine. In other words, the virtual machine is granted access to its reservation cycles regardless of what else is happening on the host, but if the virtual machine needs more — and there's competition — then the share values come into play.

Several conditions have to be met for shares to even be considered for allocating CPU cycles. The best way to determine this is to consider several examples. For the examples to be covered, we will assume the following details about the environment:

♦ The ESX Server host includes dual single-core, 3GHz CPUs.

♦ The ESX Server host has one or more virtual machines.

Scenario 1 The ESX host has a single virtual machine running. The shares are set at default for the running virtual machines. Will the shares value have any effect in this scenario? No — there's no competition between virtual machines for CPU time.

Scenario 2 The ESX host has two idle virtual machines running. The shares are set at default for the running virtual machines. Will the shares values have any effect in this scenario? No — there's no competition between virtual machines for CPU time as both are idle.

Scenario 3 The ESX host has two equally busy virtual machines running (both requesting maximum CPU capabilities). The shares are set at default for the running virtual machines. Will the shares values have any effect in this scenario? No. Again, there's no competition between virtual machines for CPU time, this time because each virtual machine is serviced by a different core in the host.

Scenario 4 To force contention, both virtual machines are configured to use the same CPU by setting the CPU affinity, shown in Figure 9.12. The ESX host has two equally busy virtual machines running (both requesting maximum CPU capabilities). This ensures contention between the virtual machines.

Figure 9.12 CPU affinity can tie a virtual machine to physical CPU at the expense of eliminating VMotion capability.

The shares are set at default for the running virtual machines. Will the shares values have any effect in this scenario? Yes! But in this case, since all virtual machines have equal share values, this ensures that each virtual machine has equal access to the host's CPU queue, so we don't see any effects from the share values.

Scenario 5 The ESX host has two equally busy virtual machines running (both requesting maximum CPU capabilities with CPU affinity set to the same core). The shares are set as follows: VM1 = 2000 CPU shares and VM2 is set to the default 1000 CPU shares. Will the shares values have any effect in this scenario? Yes. In this case, VM1 has double the number of shares that VM2 has. This means that for every clock cycle that VM2 is assigned by the host, VM1 is assigned two clock cycles. Stated another way, out of every three clock cycles assigned to virtual machines by the ESX host: two are assigned to VM1 and one is assigned to VM2.

CPU Affinity Settings

If the option for CPU affinity is not present on a virtual machine, then check if this virtual machine is being hosted by a DRS cluster. CPU affinity is one of the items that must not be set for VMotion to function, and DRS uses VMotion to perform load balancing across the cluster. If CPU affinity is required on a virtual machine, it cannot be hosted by a DRS cluster. In addition, if you have CPU affinity set on a virtual machine and you then enable DRS, it will remove those CPU affinity settings. The CPU affinity setting should be avoided at all costs. Even if a virtual machine is configured to use a single CPU (for example, CPU1), it does not guarantee that it will be the only virtual machine accessing that CPU unless every other virtual machine is configured not to use that CPU. At this point, VMotion capability will be unavailable for every virtual machine. In short, don't do it. It's not worth losing VMotion. Use shares, limits, and reservations as an alternative.

Scenario 6 The ESX host has three equally busy virtual machines running (each requesting maximum CPU capabilities with CPU affinity set to the same core). The shares are set as follows: VM1 = 2000 CPU shares and VM2 and VM3 are set to the default 1000 CPU shares. Will the shares values have any effect in this scenario? Yes. In this case, VM1 has double the number of shares that VM2and VM3 have assigned. This means that for every two clock cycles that VM1 is assigned by the host, VM2 and VM3 are each assigned a single clock cycle. Stated another way, out of every four clock cycles assigned to virtual machines by the ESX host: two cycles are assigned to VM1, one is assigned to VM2, and one is assigned to VM3. You can see that this has effectively watered down VM1's CPU capabilities.

Scenario 7 The ESX host has three virtual machines running. VM1 is idle while VM2 and VM3 are equally busy (each requesting maximum CPU capabilities, and all three virtual machines are set with the same CPU affinity). The shares are set as follows: VM1 = 2000 CPU shares and VM2 and VM3 are set to the default 1000 CPU shares. Will the shares values have any effect in this scenario? Yes. But in this case VM1 is idle, which means it isn't requesting any CPU cycles. This means that VM1's shares value is not considered when apportioning host CPU to the active virtual machines. In this case, VM2 and VM3 would equally share the host CPU cycles as their shares are set to an equal value.

Given these scenarios, if we were to extrapolate to an eight-core host with 30 or so virtual machines it would be difficult to set share values on a virtual machine-by-virtual machine basis and to predict how the system will respond. Additionally, if the scenario were played out on a DRS cluster, where virtual machines can dynamically move from host to host based on available host resources, it would be even more difficult to predict how an individual virtual machine would get CPU resources based strictly on the share mechanisms. The question then becomes, “Are shares a useful tool?” The answer is yes, but in large enterprise environments, we need to examine resource pools and the ability to set share parameters along with reservations and limits on collections of virtual machines. And with that, let's introduce resource pools.

Resource Pools

The previously discussed settings for virtual machine resource allocation (memory and CPU reservations, limits, and shares) are methods used to designate the priority of an individual virtual machine compared to other virtual machines also seeking access to resources. In much the same way as we assign users to groups and then assign permissions to the groups, we can leverage resource pools to make the allocation of resources to collections of virtual machines a less tedious and more effective process.

A resource pool is a special type of container object, much like a folder, in the Hosts & Clusters view of inventory. We can create a resource pool on a stand-alone host or as a management object in a DRS cluster (discussed later in this chapter). Figure 9.13 shows the creation of a resource pool.

If you examine the properties of the resource pool, as shown in Figure 9.14, you'll see there are two sections: one for CPU settings (reservation, limit, and shares) and another section with similar settings for memory.

Figure 9.13 Resource pools can be created on individual hosts and within clusters. A resource pool provides a management and performance configuration layer in VirtualCenter inventory.

Figure 9.14 A resource pool is used for managing CPU and memory resources for multiple virtual machines contained within the resource pool.

To describe the function of resource pools, consider the following example. A company has two main classifications of servers: production and development. The goal of resource allocation in this scenario is to ensure that if there's competition for a particular resource, the virtual machines in production should be assigned higher-priority access to that resource. In addition to that goal, we need to ensure that the virtual machines in development cannot consume more than 4GB of physical memory with their running virtual machines. We don't care how many virtual machines run concurrently as part of the development group as long as they don't collectively consume more than 4GB of RAM.

The first step in creating the infrastructure to support the outlined goal is to create two resource pools: one called ProductionVMs and one called DevelopmentVMs, shown in Figure 9.15.

Figure 9.15 Two resource pools created on an ESX Server host

We will then modify the resource pool settings for each resource pool to reflect the goals of the organization, as shown in Figures 9.16 and 9.17.

Figure 9.16 The ProductionVMs resource pool is configured to be able to consume more resources in the face of contention.

As a final step in configuring the environment, the virtual machines must be moved into the appropriate resource pool by clicking on the virtual machine in the inventory panel and dragging it onto the appropriate resource pool. The result will be similar to that shown in Figure 9.18.

Now that we've got an example to work with, let's examine what the settings on each of these resource pools will do for the virtual machines contained in each of the resource pools.

In Figure 9.16, we set the ProductionVMs CPU shares to High (8000). In Figure 9.17, we set the DevelopmentVMs CPU shares to Low (2000). The effect of these two settings is similar to that of comparing two virtual machines' CPU share values — except in this case, if there is any competition for CPU resources between virtual machines in the ProductionVMs and the DevelopmentVMs resource pool, the ProductionVMs would have higher priority. To make this clearer, let's examine Figure 9.19 with the assumption that all the available virtual machines are competing for CPU cycles on the same physical CPU. Remember that share allocations only come into play when virtual machines are fighting one another for a resource. If an ESX Server host is only running four virtual machines on top of two dual-core processors, there won't be much contention to manage.

If there are two or more virtual machines in a resource pool that require different priority access to resources, we can still set individual Shares values on the virtual machines, or if we have groupings of virtual machines within a resource pool, we can also build a resource pool in a resource pool. A resource pool within a resource pool is called a child resource pool, and it contains its own shares, limits, and reservations separate from the parent resource pool.

Figure 9.17 The DevelopmentVMs resource pool is configured not to be able to consume more resources in the face of contention.

Figure 9.18 Virtual machines assigned to a resource pool consume resources allocated to the resource pool.

The next setting in the Resource Pool properties to evaluate is CPU Reservation (Figure 9.14). In the example, a CPU Reservation value of 1000 MHz has been set on the ProductionVMs resource pool. This will ensure that at least 1000 MHz of CPU time is available for all of the virtual machines located in that resource pool. This setting will have an additional effect: assuming that the ESX Server host has a total of 6000 MHz of total CPU, this means 5000 MHz of CPU time is available to other resource pools. If two more resource pools are created with a reservation value of 2500 MHz each, then the cumulative reservations on the system have reserved all available host CPU (1000+2500+2500). This essentially means the administrator will not be able to create additional resource pools with reservation values set.

The third setting on the Resource Pool is the CPU Limit field. This is similar to the individual virtual machines CPU Limit field, except in this case all virtual machines in the resource pool combined can consume up to this value. The limit applies to the collective sum of the virtual machines within the resource pool. In the example, the ProductionVMs resource pool has been configured with a CPU limit of 3000 MHz. In this case, no matter how many virtual machines are running in this resource pool, they can only consume a maximum of 3000 MHz of host CPU cycles.

Figure 9.19 Two resource pools with different Shares values will be allocated resources proportional to their percentage of share ownership.

Each resource pool also includes a setting to determine if the pool has an Expandable Reservation. The Expandable Reservation dictates whether a resource pool is allowed to ask the parent pool for more resources once it has consumed all its allocated resources. Consider a child who is given a weekly allowance of $10 by their parent. Suppose the child wants to make a purchase that costs $20. If the child's allowance is set to allow Expandable Reservations, then the child could ask the parent for the additional resource, and if the parent has the resource, it will be given. If the child's allowance is not set to allow Expandable Reservations, then they cannot ask the parent for additional resources.

Therefore, if the intent is to limit the total amount of CPU available to virtual machines in a resource pool, then the Expandable Reservation check box should be left empty. If left selected, a new virtual machine with a reservation configured that exceeds the capacity of the resource pool will be powered on if the parent is able to provide the necessary resource. In this case, the pool has not provided a hard cap on the amount of reserved resources.

Moving on to the Memory portion of the Resource Pool settings, the first setting is the Shares value. This setting works in much the same way as Memory Shares worked on individual virtual machines. It determines which pool of virtual machines will be the first to page in the face of contention. However, this setting is used to set a priority value for any virtual machine in the resource pool when competing for resources with virtual machines in other resource pools. Looking at the Memory Share settings in our example (ProductionVMs=Normal and DevelopmentVMs=Low), this would generally mean that if host memory was limited, virtual machines in the DevelopmentVMs area that need more memory than their reservation would use more pages in VMkernel swap than an equivalent virtual machine in the ProductionVMs resource pool.

The Memory Reservation value will reserve this amount of host RAM for virtual machines in this resource pool to run, which effectively ensures that there is some actual RAM that the virtual machines in this resource pool are guaranteed.

The Memory Limit value is an excellent way of setting a limit on how much host RAM a particular set of virtual machines can consume. If administrators have been given the “Create Virtual machines” permission in the DevelopmentVMs resource pool, then the Memory Limit value would prevent those administrators from running virtual machines that will consume more than that amount of actual host RAM. In our example, the Memory Limit value on the DevelopmentVMs resource pool is set to be 1024MB. How many virtual machines can administrators in Development create? They can create as many as they wish. But the number of virtual machines they will be able to run will be less. Unfortunately, this setting has nothing to do with the creation of virtual machines, but it will prevent administrators from running too many virtual machines at once. So how many can they run? The cap placed on memory use is not a per virtual machine setting but a cumulative setting. They might be able to run only one virtual machine with all the memory, or multiple virtual machines with lower memory configurations. Assuming that each virtual machine is created without an individual Memory Reservation value, the administrator can run as many virtual machines concurrently as she wants! The problem will be that once the virtual machines consume 1024MB of host RAM, anything above that amount will need to be provided by VMkernel swap. If she builds four virtual machines with 256MB as the initial memory amount, then all four virtual machines will consume 1024MB (assuming no overhead) and will run in real RAM. If she tries to run 20 of the same type of virtual machine, then all 20 virtual machines will share the 1024MB of RAM, even though their requirement is for 5120MB (20×256MB) — the remaining 4096MB would be provided by VMkernel swap. At this point performance might be noticeably slow.

The Unlimited check box should be cleared to set a limit value. The Expandable Reservation check box functions in the same way as the equivalent CPU setting. If you truly want to limit the resource pool's memory, then clear this check box.

Go Big if Just for a moment

An Expandable Reservation may not seem that useful given the comments in the text. However, think about temporarily allowing a resource pool to exceed its limit by using an expandable reservation. Consider a scenario where the Infrastructure resource pool needs more memory to deploy a new Windows 2003 virtual machine. They will use this new virtual machine to retire a Windows 2000 virtual machine and they will be doing this over the weekend. Simply select the Expandable Reservation option on Friday to effectively give them room to run both virtual machines at the same time to allow the data migration from the old virtual machine to the new one. After the weekend, verify that they have shut off the old virtual machine, clear the Expandable Reservation checkbox, and then everything is back to normal.

Memory Overhead

As they say, nothing in this world is free. There are several basic processes on an ESX host that will consume host memory. The VMkernel itself, the Service Console (272MB by default, 800MB maximum), and each virtual machine that is running will cause the VMkernel to allocate some memory to host the virtual machine above the initial amount that we assigned to it. The amount of RAM allocated to host each virtual machine depends on the configuration of each virtual machine, as shown in Table 9.1.

Exploring VMotion

We've defined the VMware VMotion feature as the ability to perform a hot migration of a virtual machine from one ESX Server host to another without service interruption. This is an extremely effective tool for load-balancing virtual machines across ESX Server hosts. Additionally, if an ESX Server host needs to be powered off for hardware maintenance or some other function that would take it out of production, VMotion can be used to migrate all active virtual machines from the host going offline to another host without waiting for a hardware maintenance window since the virtual machines will remain available to the users that need them.

Table 9.1: Virtual Machine Memory Overhead

Virtual CPUs	Memory Assigned (MB)	Overhead for 32-bit	Overhead for 64-bit
		Virtual machine (MB)	Virtual machine (MB)
1	256	79	174
1	512	79	176
1	1024	84	180
1	2048	91	188
1	4096	107	204
1	8192	139	236
1	16384	203	300
2	256	97	288
2	512	101	292
2	1024	101	300
2	2048	125	316
2	4096	157	349
2	8192	221	413
2	16384	349	541
4	256	129	511
4	512	133	515
4	1024	141	523
4	2048	157	540
4	4096	189	572
4	8192	222	605
4	16384	350	734

VMotion works by copying the contents of virtual machine memory from one ESX host to another and then transferring control of the virtual machines' disk files to the target host.

VMotion operates in the following sequence of steps:

1. An administrator initiates a migration of a running virtual machine (VM1) from one ESX Server (Silo104) to another (Silo105), shown in Figure 9.20.

Figure 9.20 Step 1 in a VMotion migration: invoking a migration while the virtual machine is powered on.

2. The source hosts (Silo104) begins copying the active memory pages VM1 has in host memory to the destination host (Silo105). During this time, the virtual machine still services clients on the source (Silo104). As the memory is copied from the source host to the target, pages in memory could be changed. ESX Server handles this by keeping a log of changes that occur in the memory of the virtual machine on the source host after that memory address has been copied to the target host. This log is called a memory bitmap as shown in Figure 9.21.

The Memory Bitmap

The memory bitmap does not include the contents of the memory address that has changed; it simply includes the addresses of the memory that has changed — often referred to the “dirty memory.”

3. Once the entire contents of RAM for the virtual machine being migrated have been transferred to the target host (SILO105), then VM1 on the source ESX Server (SILO104) is quiesced. This means that it is still in memory but is no longer servicing client requests for data. The memory bitmap file is then transferred to the target (Silo105). See Figure 9.22.

4. The target host (SILO105) reads the addresses in the memory bitmap file and requests the contents of those addresses from the source (SILO104). See Figure 9.23.

Figure 9.21 Step 2 in a VMotion migration: starting the memory copy and adding a memory bitmap

Figure 9.22 Step 3 in a VMotion migration: quiescing VM1 and transferring the memory bitmap file from the source ESX host to the destination ESX host

Figure 9.23 Step 4 in a VMotion migration: fetching the actual memory listed in the bitmap file from the source to the destination (dirty memory)

5. Once the contents of the memory referred to in the memory bitmap file have been transferred to the target host, the virtual machine starts on that host. Note that this is not a reboot — the virtual machine's state is in RAM, so the host simply enables it. This will cause a Reverse Address Resolution Protocol (RARP) from the host to register its MAC address against the physical switch port the target ESX server is plugged into. This process enables the switch infrastructure to send network packets to the appropriate ESX host from the clients who are attached to the virtual machine that just moved. See Figure 9.24.

6. Once the virtual machine is successfully operating on the target host, the memory the virtual machine was using on the source host is deleted. This memory becomes available to the VMkernel to use as appropriate. See Figure 9.25.

Try It with PING -t

Following the previous procedure carefully, you'll note there will be a time when the virtual machine being moved is not running on either the source host or the target host. This is typically a very short period of time. Testing has shown a continuous ping (ping -t) of the virtual machine being moved might, on a bad day, result in the loss of one ping packet. Most client-server applications are built to withstand the loss of more than a packet or two before the client is notified of a problem.

Figure 9.24 Step 5 in a VMotion migration: enabling the virtual machine on the target host and registering with the network infrastructure

Figure 9.25 Step 6 in a VMotion migration: deleting the virtual machine from the source ESX host

VMotion Requirements

The VMotion migration is pretty amazing, and when they see it work for the first time in a live environment, most people are extremely impressed. However, detailed planning is necessary for this procedure to function properly. The hosts involved in the VMotion process have to meet certain requirements, along with the virtual machines being migrated.

Each of the ESX Server hosts that are involved in VMotion must meet the following requirements:

♦ Shared VMFS storage for the virtual machines files

♦ A gigabit network card with a VMkernel port defined and enabled for VMotion on each host (see Figure 9.26 through Figure 9.32)

Perform these steps to create a virtual switch with a VMotion-capable VMkernel port: 1. Add a new switch that includes a VMkernel port, as shown in Figure 9.26.

Figure 9.26 A VMotion enabled VMkernel port is required to perform a hot migration of a virtual machine.

2. Choose the network adapter that is connected to the VMotion network, as shown in Figure 9.27.

Figure 9.27 VMotion requires a virtual switch associated to a physical network adapter, preferably on a dedicated physical network.

3. Enable the Use This Port Group for VMotion option and assign an IP address appropriate for the VMotion network, as shown in Figure 9.28.

Figure 9.28 The vSwitch with the VMkernel port group for VMotion must be VMotion capable.

4. Click Finish. Add a default gateway if needed, as shown in Figures 9.29 and 9.30.

Figure 9.29 Finish the VMkernel port configuration to allow VMotion migrations.

Figure 9.30 A default gateway can be created for the VMkernel port if one is required. VMkernel ports with VMotion enabled should not require a default getaway.

If the VMotion network is nonroutable, leave the default gateway blank or simply use the default gateway assigned for the Service Console, as shown in Figure 9.31.

Figure 9.31 VMotion networks without a router do not need a default or the same default gateway as the Service Console can be entered.

A successful VMotion migration between two hosts relies on all of the following conditions being met:

♦ Both the source and destination hosts must be configured with identical virtual switches that have VMkernel port groups. The names of the switches must be the same as shown in Figure 9.32.

Figure 9.32 The two hosts involved in the VMotion migration must have similarly configured VMotion enabled virtual switches.

♦ All port groups to which the virtual machine being migrated is attached must exist on both ESX hosts. Port group naming is case sensitive, so create identical port groups on each host and make sure they plug into the same physical subnets or VLANs. A virtual switch named Production is not the same as a virtual switch named PRODUCTION. Remember that to prevent downtime the virtual machine is not going to change its network address as it is moved. The virtual machine will retain its MAC address and IP address so clients connected to it don't have to resolve any new information to reconnect.

♦ Processors in both hosts must be compatible. When a virtual machine is transferred between hosts, the virtual machine has already detected the type of processor it is running on when it booted. Since the virtual machine is not rebooted during a VMotion, the guest assumes the CPU instruction set on the target host is the same as on the source host. We can get away with slightly dissimilar processors, but in general the processors in two hosts that perform VMotion must meet the following requirements:

♦ CPUs must be from the same vendor (Intel or AMD).

♦ CPUs must be from the same CPU family (PIII, P4, Opteron).

♦ CPUs must support the same features, such as the presence of SSE2, SSE3, and SSE4, and NX or XD (see the sidebar, “Processor Instruction”).

♦ For 64-bit virtual machines, CPUs must have virtualization technology enabled (Intel VT or AMD-v).

Processor instruction

SSE2 (Streaming SIMD Extensions 2) was an enhancement to the original MMX instruction set found in the PIII processor. The enhancement was targeted at the floating-point calculation capabilities of the processor by providing 144 new instructions. SSE3 instruction sets are an enhancement to the SSE2 standard targeted at multimedia and graphics applications. The new SSE4 extensions target both the graphics and application server.

AMD's XD (eXecute Disable) and Intel's NX (NoExecute) are features of processors that mark memory pages as data only, which prevents a virus from running executable code at that address. The operating system needs to be written to take advantage of this feature, and in general, versions of Windows starting with Windows 2003 SP1 and Windows XP SP2 support this CPU feature.

The latest processors from Intel and AMD have specialized support for virtualization. The AMD-V and Intel Virtualization Technology (VT) must be enabled in the BIOS in order to create 64-bit virtual machines.

VMware includes a utility named cupid.iso.gz in the images subdirectory of the ESX Server installation CD. This tool can test a server to see what CPU features the host processors have. To perform the test, unzip it and make a bootable CD, then boot each ESX host from the cupid CD and compare the output. If they match, then VMotion is CPU compatible. If they don't match, then you need to determine what doesn't match.

On a per-virtual machine basis, you'll find a setting that tells the virtual machine to show or mask the NX/XD bit in the host CPU. Masking the NX/XD bit from the virtual machine tells the virtual machine that there's no NX/XD bit present. This is useful if you have two otherwise compatible hosts with an NX/XD bit mismatch. If the virtual machine doesn't know there's an NX or XD bit on one of the hosts, it won't care if the target host has or doesn't have that bit if you migrate that virtual machine using VMotion. The greatest VMotion compatibility is achieved by masking the NX/XD bit. If the NX/XD bit is exposed to the virtual machine, as shown in Figure 9.33, the BIOS setting for NX/XD must match on both the source and destination ESX Server host.

What happens if you have SSE3 features on one host and not on the other? For mismatched SSE3 and SSE4 processors, you can change the masking by clicking the Advanced button, shown in Figure 9.33, and entering the CPU parameters you wish to mask, as shown in Figure 9.34.

Some administrators might recognize that this is a tedious task if you already have dozens of virtual machines built. The setting is changed on each virtual machine. However, if you know you have mismatched NX/XD bits or SSE3/SSE4 masks, you can change your template virtual machine and any virtual machine deployed from that template will also have the same setting.

Figure 9.33 Masking the NX/XD bit on a virtual machine

Figure 9.34 Masking SSE3 extensions for an Intel CPU

In addition to the VMotion requirements for the hosts involved, there are requirements that must be met by the virtual machine to be migrated:

♦ The virtual machine must not be connected to any device physically available to only one ESX host. This includes disk storage, CD-ROMs, floppy drives, serial ports, or parallel ports. If the virtual machine to be migrated has one of these mappings, simply clear the Connected check box beside the offending device, as shown in Figure 9.35.

Figure 9.35 Clear the Connected box for any locally mapped device prior to migrating with VMotion.

♦ The virtual machine must not be connected to an internal-only switch.

♦ The virtual machine must not have its CPU affinity set to a specific CPU.

♦ The virtual machine must not have a RAW disk mapping as part of a Microsoft Cluster Services (MSCS) configuration.

♦ The virtual machine must have all disk, configuration, log, and nonvolatile random access memory (NVRAM) files stored on a volume visible to both the source and the destination ESX Server host.

If you start a VMotion migration and VirtualCenter finds an issue that is considered a violation of the VMotion compatibility rules, you will see an error message. In some cases, a warning, not an error, will be issued. In the case of a warning, the VMotion migration will still succeed. For instance, if you have cleared the check box on the host-attached floppy drive, VirtualCenter will tell you there is a mapping to a host-only device that is not active. You'll see a prompt asking if the migration should take place anyway.

VMware states that you need a gigabit network card for VMotion; however, it does not have to be dedicated to VMotion. When you're designing the ESX Server host, dedicate a network adapter to VMotion if possible. You thus reduce the contention on the VMotion network, and the VMotion process can happen in a fast and efficient manner.

To perform a VMotion migration of a virtual machine:

1. Select a powered-on virtual machine in your inventory, right-click the virtual machine, and select Migrate, as shown in Figure 9.36.

Figure 9.36 Starting the VMotion process

2. Choose the target host, as shown in Figure 9.37.

Figure 9.37 Choose the target host.

3. Choose the target resource pool (or cluster). Most of the time the same resource pool (or cluster) that the virtual machine currently resides in will suffice. Choosing a different resource pool might change that virtual machine's priority access to resources, as shown in Figure 9.38.

Figure 9.38 Choose a target resource pool for the virtual machine being migrated.

4. Select the priority that the VMotion migration needs to proceed with. Be aware that choosing high priority will cause CPU stress on the hosts involved in the migration, as described in the option shown in Figure 9.39.

Figure 9.39 Choosing a priority level for the migration

5. Click Finish once the validation has concluded and you have reviewed the information on the summary screen, as shown in Figure 9.40.

6. The virtual machine should start to migrate. Often, the process will pause at about 10% in the progress dialog box, and then again at 90%. The 10% pause occurs while the hosts in question establish communications and gather the information for the pages in memory to be migrated; the 90% pause occurs when the source virtual machine is quiesced and the dirty memory pages are fetched from the source host, as shown in Figure 9.41.

Figure 9.40 Click Finish to start the actual migration.

VMotion is an invaluable tool for virtual administrators, and certainly the feature that put ESX Server on the map. But VMotion, in this latest version of ESX Server, has evolved into more than just a simple tool for moving virtual machines. VMotion is the backbone for the new Distributed Resource Scheduler (DRS) feature that can be enabled on ESX Server clusters. Before we get to the details of DRS, you need to understand clusters.

Figure 9.41 Progress of a VMotion migration

Clusters

As virtual environments grow, organizations can and will add multiple ESX Server hosts to handle the workload of the ever-increasing, sometimes exponentially increasing, number of virtual machines. Some of the concerns with adding a number of stand-alone ESX hosts, even those managed by VirtualCenter, include issues such as “What happens when a host fails” and “How can I effectively balance the load across more than one ESX host?” VMware Infrastructure 3 (VI3) handles both of these issues by creating a cluster of ESX Servers.

What is a cluster? A cluster is 2 to 32 ESX Servers that work cooperatively to provide for features such as High Availability and Distributed Resource Scheduler (DRS). Clusters themselves are implicitly resource pools; however, resource pools can be built under a cluster. This gives the administrator a larger collection of resources to carve up, and the virtual machine can run on any node in the cluster and still be affected by its membership in the resource pool.

Cluster setup is fairly straightforward. There are no special hardware requirements above what an ESX host should already have. Each of the hosts has to be able to talk to the other host on the Service Console network, and all nodes of the cluster must be managed by the same VirtualCenter. Additionally, all hosts in the cluster must belong to the same datacenter in VirtualCenter because the cluster is a child of a Datacenter object.

To create a cluster, right-click a Datacenter object in the VirtualCenter inventory and select the New Cluster option as shown in Figure 9.42.

Figure 9.42 Cluster creation

Once the cluster has been created, ESX hosts can be moved into the cluster by dragging and dropping them onto the cluster object.

Cluster Limits

There is a functional limit to the number of hosts in an ESX cluster, but it depends on which features are enabled on the cluster itself. For ESX 3.5 with VirtualCenter 2.5, the absolute limit is 32 ESX hosts per cluster. However, VMware's recommended maximum number of hosts is 16 in each circumstance. If you have more hosts in the datacenter than will (or can) be used in one cluster, consider building multiple clusters, which can be a benefit based on processor matching for VMotion and different cluster settings.

If an ESX Server host contains resource pools and is added to a non-DRS cluster, a warning message stating that existing resource pools will be removed appears, as shown in Figure 9.43.

Figure 9.43 Adding a host with an existing resource pool to a non-DRS-enabled cluster

To preserve resource pools and the settings on the host added to the cluster, select the No option on the warning message shown in Figure 9.43.

Once a cluster is created, DRS can be enabled. To enable DRS, right-click the cluster to edit the settings. Then enable DRS on the cluster, as shown in Figure 9.44.

Figure 9.44 Enabling DRS on a cluster

If an ESX Server host contains resource pools and is added to a DRS-enabled cluster, a wizard will be initiated that allows existing resource pools to be deleted or maintained, as shown in Figure 9.45.

Figure 9.45 Adding an ESX Server host that contains resource pools to a DRS-enabled cluster offers the option to keep the existing resource pools or delete them.

The options provided, as shown in Figure 9.45, are just as they sound. The first option will delete any existing resource pools on the host, and the virtual machines in them will be a part of the cluster but not part of a resource pool. The second option will add the existing resource pools on the host as child objects of a new resource pool created in the cluster, as shown in Figures 9.46 and 9.47.

Figure 9.46 Confirm the addition of a host to the cluster.

Figure 9.47 Resource pools kept when a host is added to a cluster will, by default, fall under a parent pool that begins with the name “Grafted from” followed by the hostname.

Once the resource pools have been migrated, eliminate the additional resource pool (Grafted from…) created using one of two methods:

♦ Drag the child resource pools from the newly created resource pool and drop them onto the cluster itself.

♦ Move the virtual machines from the imported resource pools into other existing resource pools in the cluster (after adjusting the resource pools to reflect the new reservations and/or limits required to support the newly added virtual machines).

Since Chapter 10 will extensively deal with HA, we'll focus on DRS and how it affects resource management.

Exploring Distributed Resource Scheduler (DRS)

DRS is a feature of VirtualCenter on the properties of a cluster that balances load across multiple ESX hosts. It has two main functions: the first is to decide which node of a cluster should run a virtual machine when it's powered on, and the second is to evaluate the load on the cluster over time and either make recommendations for migrations or automatically move virtual machines to create a more balanced cluster workload using VMotion. Fortunately for those of us who like to retain control, there are parameters that set how aggressively DRS will automatically move virtual machines around the cluster.

If we start by looking at the DRS properties, as shown in Figure 9.48, there are three selections regarding the automation level of the DRS cluster: Manual, Partially Automated, and Fully Automated. The slider bar only affects the actions of the fully automated setting on the cluster. These settings control the initial placement of a virtual machine and the automatic movement of virtual machines between hosts.

Figure 9.48 A DRS cluster can be set to automate as much or as little as desired.

Manual

The Manual setting of the DRS cluster will prompt you every time you power on a virtual machine for the node that you want to launch that virtual machine on. The dialog box rates the available hosts according to suitability at that moment in time: the more stars, the better the choice, as shown in Figure 9.49.

The Manual setting will also suggest Migrations when DRS detects an imbalance between ESX hosts in the cluster. This is an averaging process that works over longer periods of time than many of us are used to in the information technology field. It is unusual to see DRS make any recommendations unless an imbalance has existed for longer than 5 minutes. The recommended list of migrations is available by selecting the cluster in Inventory and then selecting the Migrations tab, as shown in Figure 9.50.

Figure 9.50 shows that DRS rates both the SERVER1 virtual machine and the Win2008-02 virtual machine as very strong candidates to move from host Silo106 to Silo105 and Silo104, respectively. The number of stars a migration can have ranges from one to five: a one-star migration suggestion is a gentle recommendation and a four-star migration is a much stronger suggestion.

Figure 9.49 A DRS cluster set to Manual will let you specify where the virtual machine should be powered on.

Figure 9.50 Recommended migrations for a DRS cluster

To agree with DRS and start the migration, select the virtual machine you want to migrate on the Migrations tab and click the Apply Migration Recommendation button. VMotion will handle the migration automatically.

Partially Automated

If you select the Partially Automated setting on the DRS properties, as shown in Figure 9.48, DRS will make an automatic decision about which host a virtual machine should run on when it is initially powered on (without prompting the admin who is performing the power on task), but it will still prompt for all migrations on the Migration tab.

Fully Automated

The third setting for DRS is Fully Automated. This setting will make decisions for initial placement without prompting and will also make automatic VMotion decisions based on the selected automation level (the slider bar).

There are five positions for the slider bar on the Fully Automated setting of the DRS cluster. The values of the slider bar range from Conservative to Aggressive, as shown in Figure 9.48. Conservative automatically moves migrations evaluated with five stars. Any other migrations are listed on the Migrations tab and require administrator approval. If you move the slider bar from the most conservative setting one stop to the right, then all four- and five-star migrations will automatically be approved; three stars and less will wait for administrator approval. With the slider all the way over to the aggressive setting, any imbalance in the cluster that causes any recommendation will be automatically approved. Be aware that this can cause additional stress in your ESX host environment, as even a slight imbalance will trigger a migration.

DRS runs on an intelligent algorithm that is constantly being checked. Calculations for migrations can change regularly. Assume that during a period of high activity DRS makes a recommendation with three stars, and the automation level is set so three-star migrations need manual approval but the recommendation is not noticed (or an administrator is not even in the office).

An hour later, the virtual machines that caused the three-star migration in the first place have settled down and are now operating normally. At this point the Migrations tab no longer reflects the migration recommendation. The recommendation has since been withdrawn. This behavior occurs because if the migration was still listed, an administrator might approve it and cause an imbalance where one did not exist.

Earlier we mentioned that five-star migrations had little to do with load on the cluster. The first function that causes a five-star migration recommendation is when you put a host into Maintenance Mode, as shown in Figure 9.51.

Figure 9.51 An ESX Server host put into Maintenance Mode cannot power on new virtual machines or be a target for VMotion.

Maintenance Mode is a setting on a host that allows virtual machines currently hosted on it to continue to run, but does not permit new virtual machines to be launched on that host either manually or via a VMotion or DRS migration. Additionally, when a host belonging to an automated DRS cluster is placed into Maintenance Mode, all of the virtual machines currently running on that host receive a five-star migration recommendation, which causes all virtual machines on that host to be migrated to other hosts (assuming they meet the requirements for VMotion).

The second item that will cause a five-star migration recommendation is when two virtual machines defined in an anti-affinity rule are run on the same host, or when two virtual machines defined in an affinity rule are run on different hosts. This leads us to a discussion of DRS rules.

Real World Scenario

A Quick Review of DRS Cluster Performance

Monitoring the detailed performance of a cluster is an important task for any virtual infrastructure administrator. Particularly, monitoring the CPU and memory activity of the whole cluster as well as the respective resource utilization of the virtual machines within the cluster. The summary tab of the details pane for a cluster object includes a pair of bar graphs that can provide administrators with a quick performance snapshot of a DRS Cluster.

The top chart in the VMware DRS Cluster Distribution snapshot reflects the CPU and memory utilization of the hosts in the cluster. The bottom chart reflects the percentage of entitled resources that have been delivered to a virtual machine by the ESX Server host on which it runs. In English, the top chart shows how hard the ESX Server hosts are working while the bottom chart shows if the virtual machines are getting the resources they require.

The figure shown below identifies that 2 of the three ESX Server hosts are using between 0 and 10% of their memory and the third host is using between 30% and 40% of its memory. It also shows that all three hosts are able to deliver the resources required by the virtual machines running on each respective host.

For the resource utilization-conscious virtual infrastructure admin, note that the best looking charts would show all bars toward the left hand side of the top chart and all bars toward right hand side of the bottom chart. This would identify that the ESX Server hosts are not working too hard but the virtual machines are getting all the resources they require.

By keeping an eye on these summary graphs, administrators will have a good indication of when it is time to dig deeper into cluster performance or perhaps even make a decision about growing the cluster by adding more hosts. For example, if the bars on the top chart began creeping into the 40% to 50% or 50% to 60% range, but the bottom chart still showed 90% to 100%, an administrator would know that the time to add a new host is coming soon. This type of display would indicate that all current virtual machines are getting the resources they require but the host is utilizing up to 60% of its resources.

DRS Rules

An administrator creates DRS rules to control how DRS decides which virtual machines should be combined with which other virtual machines, or which virtual machines should be kept separate by the DRS process. Consider the rules page shown in Figure 9.52.

Consider an environment with two mail server virtual machines. In all likelihood, administrators would not want both mail servers to reside on the same ESX Server host. At the same time, administrators would want a web application server and the back-end database server to reside on the same hosts. That might be a combination that should always be together to ensure rapid response times between them. These two scenarios could be serviced very well with DRS rules.

Figure 9.52 DRS affinity (Keep Virtual Machines Together) and anti-affinity rules (Separate Virtual Machines)

Perform the following steps to create a DRS anti-affinity rule:

1. Right-click the DRS cluster where the rules need to exist and select the Edit Settings option.

2. Click the Rules option.

3. Type a name for the rule and select the type of rule to create:

♦ For anti-affinity rules, select the Separate Virtual Machines option.

♦ For affinity rules, select the Keep Virtual Machines Together option.

4. Click the Add button to include the necessary virtual machines to the rule, as shown in Figure 9.53.

5. Click OK.

6. Review the new rule configuration, as shown in Figure 9.54.

7. Click OK.

With DRS rules, it is possible to create fallible rules, such as building a “Separate virtual machines” rule that has three virtual machines in it on a DRS cluster that only has two hosts. In this situation, VirtualCenter will generate report warnings because DRS cannot satisfy the requirements of the rule.

Rules can be temporarily disabled by clearing the check box next to the rule, as shown in Figure 9.55.

Although most virtual machines should be allowed to take advantage of the DRS balancing act, there will likely be enterprise-critical virtual machines that administrators are adamant about not being VMotion candidates. However, the virtual machines should remain in the cluster to take advantage of the High Availability (HA) feature. In other words, virtual machines will take part in HA but not DRS despite the fact that both features are enabled on the cluster. As shown in Figure 9.56, virtual machines in a cluster can be configured with individual DRS compatibility levels.

Figure 9.53 Adding virtual machines to a DRS rule

Figure 9.54 A completed DRS rule

Figure 9.55 Temporarily disabling a DRS rule

Figure 9.56 Virtual machine options for a DRS cluster

This dialog box lists the virtual machines that are part of the cluster and their default automation level. In this case, all virtual machines are set at Fully Automated because that's how the automation level of the cluster was set. The administrator can then selectively choose virtual machines that are not going to be acted upon by DRS in the same way as the rest in the cluster. The automation levels available include:

♦ Fully Automated

♦ Manual

♦ Partially Automated

♦ Disabled

♦ Default (inherited from the cluster setting)

The first three options work as discussed in this chapter. The Disabled option turns off DRS, including the automatic host selection at startup and the migration recommendations. The default option configures the virtual machine to accept the automation level set at the cluster.

At Least Be Open to Change

Even if a virtual machine or several virtual machines have been chosen not to participate in the automation of DRS, it is best not to set virtual machines to the Disabled option since recommendations will not be provided. It is possible that a four- or five-star recommendation could be provided that suggests moving a virtual machine an administrator thought was best on specific host. Yet the migration might suggest a different host. For this reason, the Manual option is better. At least be open to the possibility that a virtual machine might perform better on a different host.

VI3 provides a number of tools for administrators to make their lives easier as long as the tools are understood and set up properly. It might also be prudent to monitor the activities of these tools to see if a change to the configuration might be warranted over time as your environment grows.

The Bottom Line

Manage virtual machine memory. The VMkernel is active and aggressive in its management of memory utilization across the virtual machines.

Master It A virtual machine needs to be guaranteed 1GB of RAM.

Master It A virtual machine should never exceed 2GB of physical memory.

Manage virtual machine CPU allocation. The VMkernel works actively to monitor, schedule, and migrate data across CPUs.

Master It A virtual machine must be guaranteed 1000 MHz of CPU.

Create and manage resource pools. Resource pools portion CPU and memory from a host or cluster to establish resource limits for pools of virtual machines.

Master It A resource pool needs to be able to exceed its reservation to provide for additional resource guarantees to virtual machines within the pool.

Configure and execute VMotion. VMotion technology is a unique feature of VMware Infrastructure 3 (VI3) that allows a running virtual machine to be moved between hosts.

Master It Identify the virtual machine requirement for VMotion. Master It Identify the ESX Server host requirements for VMotion.

Master It Five ESX Server hosts need to be grouped together for the purpose of enabling the Distributed Resource Scheduler (DRS) feature of VI3.

Configure and manage Distributed Resource Scheduling (DRS). DRS builds off the success and efficiency of VMotion by offering an automated VMotion based on an algorithm that analyzes system workloads across all ESX Server nodes in a cluster.

Master It A DRS cluster should determine on which ESX Server host a virtual machine runs when the virtual machine is powered, but it should only recommend migrations for VMotion.

Master It A DRS cluster should determine on which ESX Server host a virtual machine runs when the virtual machine is powered on, and it should also manage where it runs for best performance. A VMotion should only occur if a recommendation is determined to be a four- or five-star recommendation.

Master It Two virtual machines running a web application and a back-end database should be kept together on an ESX Server host at all times. If one should be the target of a VMotion migration, the other should be as well.

Master It Two virtual machines with database applications should never run on the same ESX Server host.