Chapter 11 Monitoring Virtual Infrastructure Performance

The monitoring of a virtual infrastructure should be a combination of proactive benchmarking and reactive alarm-based actions. VirtualCenter provides both methods to help the administrator keep tabs on each of the virtual machines and hosts as well as the hierarchical objects in the inventory. Using both methods will ensure that the administrator will not be caught unaware of performance issues or lack of capacity.

VirtualCenter provides some exciting new features for monitoring your virtual machines and hosts. The new and improved Performance tab has updated graphs and more options for customization. The Datacenter, Folder, Cluster, and ESX Server objects have Virtual Machines tabs to give the administrator an at-a-glance view of how the virtual machines are running.

In this chapter you will learn to:

Create an alarm

Work with graphs

Customize host and virtual machine graphs for CPU, Memory, Network, and Disk

Save a graph

Creating Host and Virtual Machine Alarms

The Performance tab provides a robust mechanism for creating graphs that depict the actual resource consumption over time for a given host or virtual machine. The graphs provide historical information and can be used for trend analysis. VirtualCenter provides many objects and counters to analyze the performance of a single virtual machine or host for a selected interval. The Virtual Machines tab provides information on a virtual machine's CPU and memory consumption, as shown in Figure 11.1.

In addition to the graphs and high-level information tabs, the administrator can create alarms for virtual machines or hosts based on predefined triggers provided with VirtualCenter. These alarms can monitor resource consumption or the state of the virtual machine and alert the administrator when certain conditions have been met, such as high resource usage or even low resource usage. These alarms can then provide an action that informs the administrator of the condition by e-mail or SNMP trap. An action can also automatically run a script or provide other means to correct the problem the virtual machine or host may be experiencing.

Figure 11.1 The Virtual Machines tab of a Cluster object offers a quick look at virtual machine CPU and memory usage.


The creation of alarms to alert the administrator of a specific condition is not new in VirtualCenter. But the addition of new triggers, conditions, and actions gives the alarms more usefulness than in previous editions. As you can see in Figure 11.2, the alarms that come with VirtualCenter are defined at the topmost object, Hosts & Clusters.

Figure 11.2 Default alarms for hosts and virtual machines are created at the Hosts & Clusters inventory object.


The default alarms are generic in nature and are set to provide identification of host or virtual machine CPU and memory consumption that exceeds 75% and 90% thresholds. These alarms will also identify a change in the state of the host or virtual machine. In most cases, virtual machines won't require alarms with thresholds with such high values as they are usually low-utilization applications. And while the host alarms might get to those levels in times of a server outage, a good virtualization architecture will prevent such high utilization on a consistent basis.

Since the default alarms are likely too generic for your administrative needs, creating your own alarms is often necessary. An administrator may want to watch a specific virtual machine to see if it drops below a certain threshold. A service-level agreement (SLA) for this virtual machine may specify that consumption not drop below 20% CPU utilization to meet its obligations. Also, let's say that this virtual machine normally runs around 30% CPU utilization when the application is running hot. The administrator could set a virtual machine alarm with a trigger for CPU and a condition of Is Below to 25% to send the administrator an e-mail warning that the virtual machine has dropped below 25%. The alarm can also be set to e-mail the administrator if the virtual machine drops below 20%, and, in this case, the e-mail would alert the administrator so corrective action could be taken.

Figure 11.2 shows the Alarms dialog box used to configure the triggers, thresholds, and actions of an alarm. The dialog box includes two buttons near the top-left portion: Triggered Alarms and Definitions. Any alarm(s) currently triggered display for this inventory object when you click the Triggered Alarms button. Clicking the Definitions button switches the view to show any alarms that have been created or inherited for this inventory object. Speaking of inheritance, if you create an alarm on a datacenter, folder, cluster, or resource pool, the alarm will be inherited by all child objects below it. In some cases, this is what you want, but in other cases, the alarm may be specific to a particular virtual machine or host. In such cases the alarm will be created directly on that object and therefore not inherited. Figure 11.3 shows the configuration of a trigger and thresholds for an alarm.

Figure 11.3 A sample alarm showing a virtual machine CPU alarm definition with an Is Below condition, a % Warning of 25%, and % Alert of 20%.


Perform the following steps to create an alarm:

1. Click to select a particular object in the inventory, such as a virtual machine, host, resource pool, cluster, folder, or datacenter. Then select the Alarms tab, and right-click in the blank area of the Definitions pane, as shown in Figure 11.4, and choose New Alarm.

Figure 11.4 The starting point for a new alarm.


2. In the Alarm Settings dialog box, specify whether this will be a host-based alarm or a virtual machine-based alarm. In either case, most of the setup will be the same.

3. On the General tab, select the alarm type and then choose a trigger priority of either Red or Green, as shown in Figure 11.5. The red and green alarm notifications are simply visual cues of changes. They are arbitrary but can represent a good change (green) or a bad change (red).

Figure 11.5 On the General tab, you specify an alarm name, alarm type, and trigger priority.


Red Light, Yellow Light, Green Light

Most alarms are of the red variety when it comes to the Trigger Priority setting. But occasionally the administrator wants to know when a virtual machine hits a warning level or when a virtual machine returns to a green condition. The yellow warnings allow administrators to catch potential problems before they reach unacceptable levels that result in performance declines. For instance, a virtual machine that kicks off a batch job during the day goes red when the job starts, and when the batch is done the virtual machine goes back to green and alerts the administrator that the job has completed.

4. If this alarm goes immediately into service, select the Enable This Alarm option.

5. Click the Triggers tab.

6. A trigger, show in Figure 11.6, provides a way to specify a particular condition or threshold to monitor. There are five possible triggers for an alarm:

♦ Host or Virtual Machine CPU Usage

♦ Host or Virtual Machine Memory Usage

♦ Host or Virtual Machine Network Usage

♦ Host or Virtual Machine Disk Usage

♦ Host or Virtual Machine State

Figure 11.6 Selecting a trigger for either a virtual machine or a host.


Once you choose a trigger type, you must specify a condition. There are only two conditions — Is Above and Is Below — for the Usage alarms. For the State alarms, Is Equal To or Not Equal To are your only choices. If the trigger type was the state of a host or virtual machine, then setting the condition to Is Equal To could provide an alert if the virtual machine was powered off, as shown in Figure 11.7.

Figure 11.7 Setting a trigger type of Virtual Machine State with a condition of Is Equal To, a warning of None, and an alert of Powered Off.


Real World Scenario

Caution: Counter Values Will Vary!

The Is Above condition is selected most often for identifying a virtual machine or host that is over a certain threshold. The administrator decides what that threshold should be and what is considered abnormal behavior (or at least interesting enough behavior to be monitored). For the most part, monitoring across ESX Server hosts will be consistent. For example, administrators will define a threshold that is worthy of being notified about and configure an alarm across all hosts for monitoring that counter. However, when looking at the more granular virtual machine monitoring, it might be more difficult to come up with a single baseline that works for all virtual machines. Specifically, think about enterprise applications that must perform well for extended periods of time. For these types of scenarios, administrators would want custom alarms for earlier notifications of performance problems. This way, as opposed to reacting to a problem, administrators can be proactive in trying to prevent problems from occurring.

For virtual machines with similar functions like domain controllers and DNS servers, it might be possible to establish baselines and thresholds covering all such infrastructure servers. In the end, the beauty of the monitoring tools lies in the flexibility to be as customized and as granular as needed for each virtual infrastructure.

8. Next, provide the warning and alert thresholds required to monitor for specific usage conditions. The values for these two variables vary and depend on the type and resource tendencies of the application, SLA, or other abnormal or interesting behavior, as shown in Figure 11.8.

Figure 11.8 Setting warning and alert thresholds for an alarm.


9. After you've set these four variables on the Triggers tab, it's on to the Reporting tab, as shown in Figure 11.9.

I Know Already!

The Reporting tab is new to this version of VirtualCenter. This tab gives the administrator greater flexibility in defining how often an alarm sends an e-mail or a SNMP trap or some other action. The Tolerance section provides a way to set the percentage of change before the second alarm is sent out. If the original alarm was set to send an e-mail after an initial trigger threshold of 50%, and if the Tolerance field was set to 10%, then the second e-mail would be sent if the threshold had changed to 55%. This allows the administrator to monitor escalating events or to prevent receiving another e-mail for the same persistent but unvarying condition.

The Frequency section allows the administrator to define how much time should elapse before another e-mail or SNMP trap is sent out. An example of where this may be helpful is if a virtual machine's CPU shoots up to over 75% and the administrator knows this is normal for this virtual machine given the nature of what it was designed to do. If the condition lasted longer than what was considered normal for that virtual machine, another e-mail will be sent after so many seconds have elapsed (which could turn into minutes if a very high value is used).

Both of these Reporting features can be used simultaneously or individually to give the administrator greater precision.

Figure 11.9 The Reporting tab, which contains Tolerance and Frequency settings.


10. Click the Actions tab.

11. The reason for having the alarm is to monitor defined conditions and alert administrators, perhaps via e-mail as shown in Figure 11.10, so they can take any necessary administrative action. Administrators can also configure alarms to be proactive in trying to solve any problems by allowing the alarm to respond one of the predefined ways or by running a script that provides an infinite variety of actions. The precise action depends on whether the object being monitored is a host or a virtual machine. If it is a host, the possible actions are:

♦ Send a notification e-mail

♦ Send a notification trap

♦ Run a script

If the object being monitored is a virtual machine, then the possible actions are:

♦ Send a notification e-mail

♦ Send a notification trap

♦ Run a script

♦ Power on a virtual machine

♦ Power off a virtual machine

♦ Suspend a virtual machine

♦ Reset a virtual machine

Figure 11.10 An example of sending an e-mail if the alarm is triggered.


Alarm Scripts

If the action to be taken involves running a script, understand that the script runs on the VirtualCenter server and may consume significant resources. On the Actions tab, under the Action column, choose Run a Script. You must then supply a value. The syntax for calling the script is c:\fixmyvm.vbs {targetName} {alarmName} and must be passed as one string. The targetName is the host or virtual machine name; the alarmName is the name of the alarm.

Each action is tied to a change in condition or conditions that are listed to the right of the actions as checkboxes:

♦ From Green to Yellow

♦ From Yellow to Red

♦ From Red to Yellow

♦ From Yellow to Green

12. Once you've configured all the tabs, click OK.

13. To have VirtualCenter send an e-mail for a triggered host or virtual machine alarm, provide the recipient's e-mail address (see Figure 11.11). VirtualCenter must also be configured with an SMTP server to send any e-mails. Normally, the setup of the SMTP server and the SNMP management receiver(s) would be established ahead of time. To configure the SMTP server, from the main VI Client screen choose the Administration menu, then VirtualCenter Management Server Configuration. Click on Mail in the list on the left, and then supply the SMTP server and the Sender Account so that when you receive an e-mail, you know it came from the VirtualCenter server, such as VI3alarms@learn2virtualize.com. To have VirtualCenter send an SNMP trap, follow the same procedure but click on SNMP in the VirtualCenter Management Server Configuration dialog box on the left and specify one to four management receivers to monitor for traps.

Figure 11.11 Setting up the SMTP server to send e-mails on behalf of the alarms on the VirtualCenter server.


Once the four tabs have been configured, click OK and your alarm will be added to the list for that object. You can have more than one alarm for an object. As with any new alarm, testing its functionality is crucial to make sure you get the desired results. If the alarm needs editing, right-click the alarm from the list and choose Edit Settings to make the necessary modifications. Or, if the alarm is no longer needed, right-click the alarm and choose Remove to delete the alarm.

Performance Graph Details and esxtop

VirtualCenter has many new and updated features for creating and analyzing graphs. Without these graphs, analyzing the performance of a virtual machine would be nearly impossible. Installing agents inside a virtual machine will not provide accurate details about the server's behavior or resource consumption. The reason for this is elementary: a virtual machine is only configured with virtual devices. Only the VMkernel knows the exact amount of resource consumption for any of those devices since it acts as the translator between the virtual hardware and the physical hardware. In most virtual infrastructures, the virtual machines' virtual devices can outnumber the actual physical hardware devices, necessitating complex sharing and scheduling abilities in the VMkernel.

By clicking the Performance tab for a host or virtual machine, you can learn a wealth of information. The default view for either a host or a virtual machine is CPU consumption. But before we analyze the consumption, we need to get to know the performance graphs and legends.

Performance Graphs

Starting from the top and working our way down, on the top left you'll see the host or virtual machine being monitored. Just below the tabs, the type of chart and its interval appears. The graph above is a real-time graph that shows what has occurred in the last hour. It updates every 20 seconds. You can change the interval and resource being monitored by clicking the Change Chart Options link. At the top right, we see the Refresh icon, the Save icon, and the Tear-off icon. The Refresh icon is self-explanatory, but the next two will bear some explanation a little later.

On each side of the graph are units of measure. In Figure 11.12, the counters selected are measured in Percent and MHz. Depending on the counters chosen, there may be only one unit of measurement, but no more than two. Next, on the horizontal axis, is the Time interval. Below that, the Performance Chart Legend provides color-coded keys to help the user find a specific object or item of interest. This area also breaks down the graph into the object being measured, the measurement being used, the units of measure, and the Latest, Maximum, Minimum, and Average measurements recorded for that object.

Figure 11.12 A Performance graph for a single virtual machine.


Hovering the mouse over the graph at a particular recorded interval of interest displays the data points at that specific moment in time, as shown in Figure 11.13.

Another nice feature of the graphs is the ability to emphasize a specific object so that it is easier to pick out this object from the other objects. By clicking the specific key at the bottom, the key and its color representing a specific object will be emphasized while the other keys and their respective colors become lighter and less visible, as shown in Figure 11.14.

Figure 11.13 Hovering the mouse over a specific data point on the graph will display specific measurements for that particular point in time.


Figure 11.14 By clicking on a specific key in the Performance Chart Legend, that key will be emphasized while the other keys become less visible.


If the current graph does not reveal the data you were looking to find, click Change Chart Options at the top of the Performance chart to open a dialog box, shown in Figure 11.15, which lets you select from among many counters across the various physical hardware components.

On the left, you can choose which resource (CPU, Disk, Memory, Network, or System) to monitor or analyze. By selecting one of these options, you then have a choice of intervals to look at. Real-time will show you what has occurred in the last hour. The others are self-explanatory. For trend analysis, having all of these interval options allows you to choose exactly which interval you need. If these intervals are still not precise enough, you can create a custom interval c by selecting the Custom option under any of the available Chart Options, as shown in Figure 11.16.

Figure 11.15 The Customize Performance Chart dialog box has many options.


Figure 11.16 Setting a custom interval is easy. Here, the user only wants to look at the last six hours of data.


Many times, you want to look at what is happening now or what happened in the last hour for a host or a virtual machine. The Real-time interval gives the best view. This view also gives you access to certain counters for each resource type for a given host or virtual machine that are not available in the other views. If a particular counter is new to you, click on it to highlight the counter. At the bottom of the dialog box, in a section called Rollup, you'll see a description of the counter. For a host, the objects that can be monitored are the host as a whole and the individual physical devices. For the host chart option CPU, the objects and counters breakdown is shown in Figure 11.17.

Figure 11.17 A host's CPU objects and counters.


Viewing Objects and Counters

To get a clear view of all the objects and counters in the Customize Performance Chart dialog box, you have to stretch the dialog box at the top or the bottom to lengthen it. This makes all of the counters visible. Sadly, there is no way to increase the length of the Objects section to see all of them, so scrolling is necessary.

For a host chart option Disk, the objects and counters are shown in Figure 11.18. The devices being monitored reflect the vmhba paths used by the VMkernel to access the disks associated with the virtual machine. Remember that the vmhba paths that are defined can use any of three types of controllers: a local SCSI controller, a fibre channel, or an iSCSI controller (if the controller is based on the iSCSI software initiator, the device name will be vmhba32).

No NAS Monitoring Objects

NAS cannot be monitored because the connection to NAS is not local SCSI, fibre channel, or iSCSI. Even if the ESX Server is configured to use NAS, that storage medium has no object listing or counters since all of the counters seem to be based on block storage only.

Figure 11.18 A host's Disk objects and counters.


For a host chart option Memory, the objects and counters are shown in Figure 11.19. In this case, there is only one object, the host itself.

For a host chart option of Network, shown in Figure 11.20, notice the objects are the host itself and any vmnic devices used by the VMkernel. If you have a particular virtual switch you want to monitor, find out which vmnic(s) are associated with that vSwitch by switching your view in the VI Client to the Configuration tab of the ESX host and then selecting Networking in the Hardware section.

For a host chart option System, the objects and counters are shown in Figure 11.21. There are several objects for the host with this chart option. This allows the administrator to see how many physical CPU cycles are being consumed by certain processes like vmware-authd or drivers. Be aware that many of the objects may not have any data associated with them.

For a virtual machine, the objects that can be monitored are the virtual machine as a whole and the individual virtual devices. For a virtual machine chart option CPU, the objects and counters breakdown is shown in Figure 11.22. Notice that the virtual device is the virtual processor 0, not the physical processor 0. If the virtual machine was configured with more than one processor, they would increment by one — 0, 1, 2, 3, etc.

Figure 11.19 A host's Memory objects and counters.


Figure 11.20 A host's Network objects and counters.


Figure 11.21 A host's System objects and counters.


Figure 11.22 A virtual machine's CPU objects and counters.


For a virtual machine chart option Disk, the objects and counters breakdown is shown in Figure 11.23. Notice the virtual device is described as vmhba40:0:0, which corresponds to a VMk-ernel storage device, in this case, an iSCSI software initiator.

For a virtual machine chart option Memory, the objects and counters breakdown is shown in Figure 11.24. The only object is the virtual machine itself, but there are several counters.

Figure 11.23 A virtual machine's Disk objects and counters.


Figure 11.24 A virtual machine's Memory objects and counters.


For a virtual machine chart option Network, the objects and counters breakdown is shown in Figure 11.25. Notice that the virtual network device is described as 4000. If the virtual machine had two or more network adapters, they would be incremented by one for each — for example, 4001, 4002, 4003, 4004.

Figure 11.25 A virtual machine's Network objects and counters.


For a virtual machine chart option System, the object is the virtual machine itself, with counters of Heartbeat and Uptime, as shown in Figure 11.26.

Once you have selected the options you need for your graph, click OK.

Figure 11.26 A virtual machine's System objects and counters.


Now that we are back to the graph, the Save icon in the top-right corner saves the graph into an Excel-formatted file. In the Time section of the Export Performance dialog box shown in Figure 11.27, choose the currently displayed interval or a different time frame.

Next, in the Chart Options section, select the type of graph to be saved: Line Graph or Stacked Graph. Figure 11.28 shows an example of a line graph.

Figure 11.27 Saving a Performance graph. Be sure to select the interval and chart options you want. Also, you have a choice of the chart type and size.


Figure 11.28 An example of the line graph for a virtual machine.


A stacked graph of the same data looks as shown in Figure 11.29.

Figure 11.29 An example of a stacked graph for a virtual machine — same data, just a different view.


Stacked graphs are useful when looking at several virtual machines on the same host. When they are stacked on top of one another, it gives an aggregate usage, with colors to distinguish between them. This allows you to identify the busiest virtual machine or the one that represents the largest load on the server. Also, you can click the Advanced button to select or deselect specific counters to be saved as part of the Excel graph.

Here's another nice feature: click the Popup Chart icon to display the graph in a new window. This lets you easily compare this host or virtual machine with another host or virtual machine, as shown in Figure 11.30.

Figure 11.30 An example of the Popup Chart feature.

esxtop

Virtual machine performance can also be monitored using a command-line tool named esxtop. A great reason to use esxtop is the immediate feedback it gives you after you've made an adjustment to a virtual machine. The esxtop command is not new, but it does include some new features and capabilities. All four resource types (CPU, Disk, Memory, and Network) can be monitored for a particular ESX host.

With the new and improved esxtop we can look at two counters, CPU Used and Ready Time. We can also see these counters in the virtual machine graphs, but with this tool they are calculated as percentages. If virtual machines are running on the same host, we can easily compare the virtual machines as they are identifiable by name, unlike in previous versions. If they run on two different hosts, then a direct comparison is harder without opening two different ssh client sessions. Even then, the virtual machines are not in direct competition, so an apples-to-apples comparison may be difficult. In Figure 11.31, esxtop is monitoring the CPU usage of several virtual machines running on the same host, which is the default view. The amount of processor time being used, %USED, and the amount of time not getting scheduled, %RDY, are listed for each virtual machine. If we increase a virtual machine's reservation or shares, we'll see a corresponding change in these two fields, as %USED will increase its percentage and %RDY will decrease. If you look at one of the other screens for a different resource, just press C to bring it back to the CPU resource screen.

Figure 11.31 The command-line tool esxtop showing CPU statistics for several virtual machines and VMkernel processes.


ESXTOP

Remember, esxtop only shows a single ESX host. In a virtual infrastructure where VMotion, DRS, and HA have been deployed, virtual machines may move around often. Making reservation or share changes while the virtual machine is currently on one ESX Server may not have the desired consequences if the virtual machine is moved to another server and the mix of virtual machines on that server represents different performance loads.

To monitor memory usage, press M. This gives you real-time statistics about the ESX Server's memory usage in the top portion and the virtual machines' memory usage in the lower section, as shown in Figure 11.32.

To monitor network statistics about the vmnics, individual virtual machines, or VMkernel ports used for iSCSI, VMotion, and NFS, press N. The columns showing network usage, as shown in Figure 11.33, include packets transmitted and received and megabytes transmitted and received for each vmnic or port. Also shown in the DNAME column are the vSwitches and, to the left, what is plugged into them, including virtual machines, VMkernel, and service console ports. If a particular virtual machine is monopolizing the vSwitch, you can look at the amount of network traffic on a specific switch and the individual ports to see which virtual machine is the culprit.

Figure 11.32 Using esxtop to display memory usage on a single ESX host.


Figure 11.33 Using esxtop to display network statistics on an ESX Server.


To monitor disk I/O statistics about each of the SCSI controllers, press D. The output of disk I/O monitoring with esxtop is shown in Figure 11.34. The columns on the far left are most often used to determine disk loads. Those columns show loads based on reads and writes per second and megabytes read and written per second. Another important column is the NVMS, which shows how many virtual machines are sharing the same controller. If an application in one of the virtual machines is sluggish, it's easy to see using this column how many other virtual machines it may be competing with.

Figure 11.34 Using esxtop to display controller I/O statistics.


Another great feature of esxtop is the ability to capture performance data for a short period of time and then play back that data. Using the command vm-support, you can set an interval and duration for the capture.

Perform these steps to capture data to be played back on esxtop:

1. While logged in as root or after switching to root user, change your working directory to /tmp by issuing the command cd /tmp.

2. Issue the command vm-support -S -i 10 -d 180. This creates an esxtop snapshot, capturing data every ten seconds, for the duration of 180 seconds, as shown in Figure 11.35.

Figure 11.35 Capturing data for esxtop.


3. The resulting file is a tarball and is gzipped. It must be extracted with the command tar -xzf esx*. tgz. This will create a vm-support directory that will be called in the next command.

4. Run esxtop-R /vm-support* to replay the data for analysis, as shown in Figure 11.36.

Figure 11.36 Replaying the data in esxtop.

Monitoring Host and Virtual Machine CPU Usage

When monitoring a virtual machine, it's always a good starting point to keep an eye on CPU consumption. Many virtual machines started out in life as underperforming physical servers. One of VMware's most successful sales pitches is being able to take all those lackluster physical boxes that are not busy and convert them to virtual machines. Thus, a server starts life physical, but then goes through a midlife crisis and converts to the religion of virtualization. Once converted, virtual infrastructure managers tend to think of these new converts as simple, lackluster, and low-utilization servers with nothing to worry over or monitor. The truth, though, is quite the opposite.

When the server was physical, it had an entire box to itself. Now it must share its resources with many other siblings. In aggregate, they represent quite a load and if some or many of them become somewhat busy, they fight with each other for the finite capabilities of the ESX Server they run on. Of course, they don't know they are fighting, but the VMkernel tries to placate them. Virtual CPUs need to be scheduled, and it does a remarkable job given that there are more virtual machines than physical processors most of the time. But in every virtual infrastructure manager's life, there comes a day when one virtual machine becomes unhappy. Usually it sends a surrogate, the application owner, to tell the manager that this server was a lot happier when it was physical. But since the conversion, life has not been the same. It is now time for the manager to convert to the religion of virtual machine monitoring and figure out what is making this virtual machine so unhappy. Thankfully, VMware Infrastructure 3 (VI3) has all the tools to make such monitoring and analysis very easy.

Let's begin with a hypothetical scenario. A help desk ticket has been submitted indicating that an application owner isn't getting the expected level of performance on a particular server, which in this case is a virtual machine. A virtual infrastructure manager needs to delve deeper into the problem, asking as many questions as necessary to discover what the application owner needs to be happy. Some performance issues are subjective, meaning some users may complain about the slowness of their applications, but they have no objective benchmark for such a claim. Other times, this is reflected in a specific benchmark, such as the number of transactions by a database server or throughput for a web server. In this case, our issue revolves around benchmarking CPU usage, so our application is CPU-processing intensive when it does its job.

Assessments, Expectations, and Adjustments

If an assessment was done prior to virtualizing a server, there may be hard numbers to look at to give some details as to what was expected with regard to minimum performance or service-level agreement (SLA). If not, the virtual infrastructure manager needs to work with the application's owner to make more CPU resources available to the unhappy virtual machine when needed.

VirtualCenter's graphs, which we have explored in great detail, are the best way to analyze usage, both short- and long-term. The great thing about the graphs is that they can tell a story about how the virtual machine has performed in the last hour, day, week, month, or even year. Maybe the help desk ticket describes a slowness issue in the last hour. As Figure 11.37 shows, we can look at the virtual machine's ability to work for the last hour.

Perform these steps to create a CPU graph:

1. Connect to the VirtualCenter server or an individual ESX Server host with the VI Client.

2. In the inventory tree, click on a virtual machine. This shows you the Summary tab. Click the Performance tab, as shown in Figure 11.37.

3. Click the Change Chart Options link.

4. Under Chart Options, click on the Network bulls-eye or the + sign, as shown in Figure 11.38. This view allows you to select the objects and counters that will provide the most relevant information. Real-time CPU graphs have a lot of counters, but usually only a few are used for any one graph. In this case, we'll use CPU Usage in MHz (Average/Rate) and CPU Ready to see how much physical processor is being used and how long on average it's taking the virtual machine to be scheduled on a physical processor.

Figure 11.37 The Performance tab is your starting point.


Figure 11.38 The default resource for a virtual machine is CPU.


CPU Ready

CPU Ready is a special counter only available using the Real-time interval. A virtual machine waiting many thousands of milliseconds to be scheduled on a processor may indicate that the ESX Server is overloaded, a resource pool has too tight a limit, the virtual machine has too few CPU shares, or, if no one is complaining, nothing at all. Be sure to work with the server or application owner to determine an acceptable amount of CPU Ready for any CPU-intensive virtual machine.

5. Select those relevant objects and counters to provide the information needed, as shown in Figure 11.39. Then, choose the chart type and click OK.

Figure 11.39 The graph shows the virtual machine's CPU consumption and how long, in milliseconds, it takes to schedule the virtual machine on a physical processor.


Monitoring a host's overall CPU usage is fairly straightforward. Keep in mind that other factors usually come into play when looking at spare CPU capacity. Add-ons such as VMotion, DRS, and HA directly impact whether there is enough spare capacity on a server or a cluster of servers. With ESX 3.x, the Service Console will usually not be as competitive for processor 0 since there are fewer processes to consume CPU time. Agents installed on the Service Console will have some impact, again on processor 0.

Service Console Stuck on 0

The Service Console, as noted, uses processor 0. But note that it will only use processor 0. The Service Console does not get migrated to other processors even in the face of heavy contention.

Follow these steps to create a real-time graph for a host's CPU usage:

1. Connect to the VirtualCenter server or an individual ESX Server host with the VI Client.

2. In the inventory tree, click on a host. This shows you the Summary tab. Click the Performance tab, as shown in Figure 11.40.

3. Click the Change Chart Options link.

4. Under Chart Options, click on the Network bull's-eye or the + sign, as shown in Figure 11.41. The objects to choose from are the physical, hyper-threaded, or core processors. There are two often-used counters in this customization dialog box: CPU Usage (Average/Rate) and CPU Reserved Capacity. Both are used to see how an individual ESX Server is being utilized. CPU Usage shows actual usage, and CPU Reserved Capacity shows how much usage is left.

Figure 11.40 The Performance tab is the central focus of obtaining information about virtual machine or host performance levels.


Figure 11.41 Looking at host processor usage and spare capacity.


CPU Reserved Capacity

The CPU Reserved Capacity counter can be used to monitor spare capacity for a single ESX Server. However, if DRS has been enabled, the Resource Allocation tab for the Cluster inventory object is more relevant since it shows cluster usage and spare capacity at a glance for all servers in a cluster.

5. Select those relevant objects and counters to provide the information required, as shown in Figure 11.42. If necessary, choose the chart type. Click OK.

Figure 11.42 This graph displays the host's overall usage and reserve capacity.


VMkernel Balancing Act

Always remember that on an oversubscribed ESX Server, the VMkernel will load balance the virtual machines based on current loads, reservations, and shares represented on individual virtual machines and/or resource pools.

By looking at the Resource Allocation tab for a Cluster or Resource Pool inventory object, we can get a picture of how CPU resources are being used for the entire pool (see Figure 11.43). This high-level method of looking at resource usage is useful for analyzing overall infrastructure utilization. This tab does a good job of adjusting individual virtual machine or resource pool reservations, limits, and/or shares without editing each object independently.

Figure 11.43 This tab displays the cluster's overall usage and reserve capacity.

Monitoring Host and Virtual Machine Memory Usage

Monitoring memory usage, whether on a host or a virtual machine, can be challenging. The monitoring itself is not difficult; it's the availability of the physical resource that can be a challenge. Of the four resources, memory can be oversubscribed without much effort. Depending on the physical form-factor chosen to host ESX Server 3.x, running out of physical RAM is easy to do. Although the blade form-factor creates a very dense consolidation effort, the blades sometimes are constrained by the amount of physical memory and network adapters that can be installed. But even with other regular form-factors, having enough memory installed comes down to how much the physical server can accommodate and your budget.

Many virtual machines do not need a great deal of memory to do their jobs effectively. If an assessment has been done prior to consolidation efforts, the amount of memory being used by any one server can be identified up front. Once the server has been converted, editing the virtual machines settings to a value more in tune with its actual usage can be achieved. This allows you to consolidate more servers onto fewer hosts. But what if the server is new to the organization and does not have a track record? What amount of memory should the virtual machine be configured with? Should you use the application's vendor recommendation? Should you start low and increase later? Or start high and reduce later?

Pre-ESX Server Baseline: Scenario 1

If the server was first deployed as a physical server with a given role, there might be some record of usage before being converted into an ESX Server. Many times physical servers are purchased with more memory than what the application needs or uses. This may be done due to hardware vendor inducements during purchasing, or standardization on a specific hardware model and build within an organization. Or it could be a case of memory inflation, where a customer goes by the application vendor's recommendations. These recommendations can be inflated to make sure the hardware will not be an issue when it comes to performance.

VMware provides a service, Capacity Planner, that assesses the physical server's resource usage prior to conversion. Even with other products or tools, assessing a physical server's memory usage during peak intervals is relatively easy. Once the physical server has been converted, the virtual machine can be configured with a new limit that reflects actual usage and not a guess. Some servers respond well in this scenario. Others start well enough, but over time increase their memory usage. This is where using VirtualCenter's graphs can be very helpful. Since the charts can be modified to reflect long-term trends and usage, it is easy to identify a virtual machine that has an increased need for more memory or, in some cases, the reverse.

New Servers: Scenario 2

The second scenario is harder to predict. A new server with no track record leaves much to the discretion of the virtual infrastructure manager. Using the application vendor's recommendation may be the best start for the virtual machine. Once the virtual machine has been live for some period of time, VirtualCenter's graphs can help determine what the actual usage is over a particular time frame and make adjustments accordingly.

Perform these steps to create a real-time graph for a virtual machine's memory usage:

1. Connect to the VirtualCenter server or an individual ESX Server host with the VI Client.

2. In the inventory tree, click on a virtual machine. This shows you the Summary tab. Click the Performance tab, as shown in Figure 11.44.

Figure 11.44 The Performance tab can be altered from the default CPU monitoring to support custom charting needs.


3. Click the Change Chart Options link.

4. Under Chart Options, click on the Network bull's-eye or the + sign. This view allows you to select the objects and counters that provide the most relevant information, as shown in Figure 11.45. Real-time memory graphs have a lot of counters, but usually only a few are used for any one graph. In this case, we'll use Memory Usage (Average/Absolute), Memory Overhead (Average/Absolute), and Memory Consumed (Average/Granted) to get a clear picture of memory utilization as it relates to host consumption and relative to what the virtual machine was configured with.

5. In Figure 11.46, the graph shows virtual machine memory consumption. Many times, you may like to know how much overhead is associated with a virtual machine or how much memory the virtual machine is using compared to what it was configured with. Choose the chart type and click OK.

When monitoring a host server's memory usage, overall utilization is important to watch. As explained earlier in this chapter, creating alarms to alert you when certain conditions arise is one way of monitoring your hosts. In addition to alarms, occasionally looking at host graphs as they pertain to memory usage will give you some perspective as to what is normal on your servers and what is abnormal or weird. There are even more counters to choose from when customizing your graphs, but, again, you will usually select just a few for any one graph.

Perform these steps to create a real-time graph for a host's memory usage:

1. Connect to the VirtualCenter server or an individual ESX Server host with the VI Client.

2. In the inventory tree, click on a virtual machine. This shows you the Summary tab. Click the Performance tab, as shown in Figure 11.47.

Figure 11.45 Selecting objects and counters to monitor memory usage for a virtual machine.


Figure 11.46 Select those relevant objects and counters to provide the information needed.


3. Click the Change Chart Options link.

4. Under Chart Options, click on the Network bull's-eye or the + sign. This view allows you to select the objects and counters that provide the most relevant information, as shown in Figure 11.48. Real-time memory graphs have a lot of counters, but usually only a few are used for any one graph. In this case, we'll use Memory Usage (Average/Absolute), Memory Overhead (Average/Absolute), Memory Active (Average/Absolute), Memory Consumed (Average/Granted), and Memory Used by VMkernel to get a clear picture of memory utilization as it relates to the host.

Figure 11.47 The Performance tab for a host defaults to monitoring information about CPU usage on the entire host.


Figure 11.48 Setting up the memory options for a host's memory graph.


Counters, Counters, and More Counters

As with virtual machines, there are a plethora of counters that can be utilized with a host for monitoring memory usage. Which ones you select will depend on what you're looking for. Straight memory usage monitoring is common, but don't forget that there are other counters that could be helpful, such as Ballooning, Unreserved, VMkernel Swap, and Shared, just to name a few. The ability to assemble the appropriate counters for finding the right information comes with experience and depends on what is being monitored.

5. The graph, shown in Figure 11.49, shows host overhead and virtual machine memory consumption. Many times, you may like to know how much overhead is associated with a virtual machine or how much memory the virtual machine is using compared to what it was configured with. Click OK.

Figure 11.49 Select those relevant objects and counters to provide the information needed. If necessary, choose the chart type.

Monitoring Host and Virtual Machine Network Usage

VirtualCenter's graphs provide a wonderful tool for measuring a virtual machine's or a host's network usage.

Monitoring network usage requires a slightly different approach than monitoring CPU or memory. With either CPU or memory, reservations, limits, and shares can dictate how much of these two resources can be consumed by any one virtual machine. Network usage cannot be constrained by any of those mechanisms. Since virtual machines plug into a virtual machine port group, which is part of a vSwitch on a single host, how the virtual machine interacts with the vSwitch can be manipulated by the virtual switch's or port group's policy. For instance, if the administrator needs to restrict a virtual machine's overall network output, Traffic Shaping can be configured on the vSwitch, but more likely on the port group, to restrict the virtual machine to a specific amount of outbound bandwidth. There is no way to restrict virtual machine inbound bandwidth on the ESX Server.

Virtual Machine Isolation

Certain virtual machines may indeed need to be limited to a specific amount of outbound bandwidth. Servers such as FTP, file and print, web and proxy servers, or any server whose main function is to act as a file repository or connection broker may need to be limited or traffic shaped to an amount of bandwidth that allows it to meet its service target, but not to monopolize the host it runs on. Isolating any of these virtual machines to a vSwitch of its own is more likely a better solution, but requires it the appropriate hardware configuration.

You can measure a virtual machine's or a host's output or reception of network traffic using the graphs in VirtualCenter. The graphs can provide accurate information on the actual usage, or ample information that a particular virtual machine is monopolizing a virtual switch, especially using the Stacked Graph chart type. Figure 11.50 shows a virtual machine's network utilization.

Figure 11.50 An example of a virtual machine's network usage.


Perform the following steps to create a real-time graph for a virtual machine's transmitted network usage:

1. Connect to the VirtualCenter server or an individual ESX Server host with the VI Client.

2. In the inventory tree, click on a virtual machine. This shows you the Summary tab. Click the Performance tab, shown in Figure 11.51.

3. Click the Change Chart Options link.

4. Under Chart Options, click on the Network bull's-eye or the + sign.

5. By selecting the Real-time option on the left, you can then choose which objects and counters on the right you want to monitor. In this example, let's say you want to monitor a virtual machine's outbound bandwidth. Select the 4000 counter and then select Network Data Transmit Rate and Network Packets Transmitted (see Figure 11.52). These two counters will give you a great window into how much network bandwidth this particular virtual machine is consuming in the outbound direction. Click OK.

Figure 11.51 To begin customizing a Network graph, click the Performance tab.


Figure 11.52 Changing the graph view to network-related information.


6. The Network Data Transmit Rate is in kilobytes/second and its unit of measurement is shown on the left as KBps. The Network Packets Transmitted represents the number of packets being transmitted. The unit of measurement is Number and appears on the right. By hovering your mouse over any data point in the graph, you can find out how much bandwidth is being consumed or how many packets are being transmitted, as shown in Figure 11.53.

Figure 11.53 This graph shows real-time outbound network usage using the mouse pointer to show a specific data point.


When looking at historical network data usage for a virtual machine, there is only one object, the virtual machine, and one counter that can be used, Network Usage (Average/Rate). This counter will show average aggregated usage, both received and transmitted, in KBps (see Figures 11.54 and 11.55).

Figure 11.54 Choosing the chart option Past Day, the object virtual machine, and the counter Network Usage (Average/Rate).


Follow these steps to create a real-time graph for a host's transmitted network usage: 1. Connect to the VirtualCenter server or an individual ESX Server host with the VI Client.

Figure 11.55 Graph showing this virtual machine's network usage for the past day.


2. In the inventory tree, click on an ESX host. This will show you the Summary tab in the Details section on the right. Select the Performance tab, shown in Figure 11.56.

Figure 11.56 Changing to the Performance tab for a host.


3. Click the Change Chart Options link.

4. Under Chart Options, click on the Network bull's-eye or the + sign, shown in Figure 11.57.

5. By selecting the Real-time option on the left, you can then choose which objects and counters on the right you want to monitor. In this example, let's say you want to monitor a host's out bound bandwidth per vmnic. Select the vmnics objects and then select Network Data Transmit Rate and Network Packets Transmitted (see Figure 11.58). Click OK.


Figure 11.57 Setting up a graph to show a host's network usage.


Figure 11.58 Choosing specific vmnics provides a way to see overall network traffic for a virtual switch or NIC team.


6. Very much like the earlier example for a virtual machine, these two counters will give you a window into how much network activity is occurring on this particular host in the outbound direction for each vmnic. This is especially relevant if you want to see different rates of usage for each physical network interface, which, by definition, represents different virtual switches or NIC teams. An example of network monitoring using the Performance tab is shown in Figure 11.59.


Figure 11.59 Graph displaying transmission data for all vmnics on the ESX host.


What if you wanted to see which virtual machine was producing the most network activity on an ESX host? Change the graph to display the virtual machines as objects by choosing the Stacked Graph (Per VM) chart type. This allows the administrator to select all or only those virtual machines to monitor for network activity. Be aware, though, that the only counter available with a stacked graph is Network Usage (Average/Usage). See Figures 11.60 and 11.61.

Figure 11.60 Choosing a Stacked Graph chart type allows you to make quick comparisons between the virtual machines running on an ESX host.


Figure 11.61 Here is a stacked graph comparing each virtual machine and showing aggregate usage of all virtual machines running on a specific ESX host. This graph emphasizes the DC virtual machine.

Monitoring Host and Virtual Machine Disk Usage

Monitoring a host's controller or virtual machine's virtual disk usage is similar in scope to monitoring network usage. This resource, which represents a controller or the storing of a virtual machine's virtual disk on a type of supported storage, isn't restricted by CPU or memory mechanisms like reservations, limits, or shares. The only way to restrict a virtual machine's disk activity is to assign shares on the individual virtual machine, which in turn may have to compete with other virtual machines running from the same storage volume. VirtualCenter's graphs come to our aid again in showing actual usage for both ESX hosts and virtual machines.

Using the graph in Figure 11.62, we can monitor a host's overall controller activity. This view doesn't allow us to see why the activity is occurring or which virtual machine is generating the activity, but if we are looking for a particular host with suspicious disk I/O activity, this is a starting point.

Figure 11.62 VirtualCenter graphs can report on host or virtual machine performance.


Perform these steps to create a host graph showing disk controller utilization:

1. Connect to the VirtualCenter server or an individual ESX Server host with the VI Client.

2. In the inventory tree, click on an ESX host. This shows you the Summary tab in the Details section on the right. Select the Performance tab.

3. Click the Change Chart Options link.

4. Under Chart Options, click on the Disk bull's-eye or the + sign, shown in Figure 11.63. Chart Options allows you to choose the interval of time to monitor. For this example, we chose Real-time, which shows Disk activity for the last hour.

Figure 11.63 Creating a real-time graph for monitoring host controller utilization.


5. Selecting an object or objects, in this case a controller, and a counter or counters lets you monitor for activity that is interesting or necessary to meet service levels. In the custom chart selection shown in Figure 11.64, selecting the objects vmhba0:0:0 and silo104.vdc.local, then selecting counters Disk Read Rate, Disk Write Rate, and Disk Usage (Average/Rate) will give an overall view of the activity for controller 0.

6. In reviewing the graph in Figure 11.65, we discover that the host isn't generating much Read activity, except for one spike around 9:55 AM. Write activity isn't much either and the Disk Usage counter is showing a low pattern of activity, which aggregates the data for both Read and Write.

Host disk I/O is in a controller context, but we can switch to a Stacked Graph view (see Figure 11.66) that allows us to see each virtual machine's activity stacked on top of one another to see their aggregate usage as well.

Figure 11.64 Setting up the objects and counters for a real-time graph for a specific controller.


Figure 11.65 Controller usage graph for vmhba0:0:0.


Figure 11.66 Switching to Stacked Graph per-virtual-machine view lets us make quick comparisons or find a virtual machine that is monopolizing its volume. This particular graph shows very little disk activity for either virtual machine.


Stacked Views

A stacked view is very helpful in identifying whether one particular virtual machine is monopolizing a volume. Whichever virtual machine has the tallest stack in the comparison may be degrading the performance of other virtual machines' virtual disks.

Now let's switch to the virtual machine view. Looking at individual virtual machines for insight into their disk utilization can lead to some useful conclusions. File and print virtual machines, or any server that provides print queues or database services, will generate some disk-related I/O that needs to be monitored. In some cases, if the virtual machine is generating too much I/O, it may degrade the performance of other virtual machines running out of the same volume. Let's take a look at a virtual machine's graph.

Perform these steps to create a virtual machine graph showing real-time disk controller utilization:

1. Connect to the VirtualCenter server or an individual ESX Server host with the VI Client.

2. In the inventory tree, click on an ESX host. This shows you the Summary tab in the Details section on the right. Select the Performance tab.

3. Click the Change Chart Options link.

4. Under Chart Options, click on the Disk bull's-eye or the + sign, as shown in Figure 11.67.

5. Using the graph objects vmhba0:0:0 and the virtual machine name, plus the counters of Disk Read, Disk Write, and Disk Usage (Average/Rate) counters, we can produce an informative picture of this virtual machine's disk I/O behavior. This virtual machine is busy at work generating reads and writes for its application. Does the graph show enough I/O to meet a service-level agreement or does this virtual machine need some help? The graphs allow administrators to make informed decisions, usually working with the application owners, so that any adjustments to improve I/O will lead to happy virtual machine owners.

Figure 11.67 Setting up the graph for a single virtual machine.


Figure 11.68 A virtual machine disk usage graph can help you decide if you need to make adjustments.


The graph in Figure 11.68 shows real disk I/O that may indicate that a virtual machine is performing its duties well. In addition, by looking at longer intervals of time to gain a historical perspective it may show that the virtual machine has become busier or fallen off its regular output and is therefore not meeting expectations. If the amount of I/O is just slightly impaired, then adjusting the virtual machine's shares maybe a way to prioritize its disk I/O ahead of other virtual machines sharing the volume. The administrator may be forced to move the virtual machine's virtual disk(s) to another volume or LUN if share adjustments don't achieve the required results.

The Bottom Line

Create an alarm

Work with graphs

Customize host and virtual machine graphs for CPU, Memory, Network, and Disk

Save a graph

Create an alarm. Creating host and virtual machine alarms is a proactive way to be alerted to abnormal behavior for all four resource groups or state changes. Alarms can be applied to a single host or virtual machine or a group of either object in the VirtualCenter hierarchy.

Master It Creating host and virtual machine alarms.

Work with graphs Creating and working with Performance graphs is the best way to monitor what is currently happening in your virtual infrastructure. Maybe more important, though, is a graph's ability to analyze trends in the performance of your hosts and virtual machines. Graphs can be saved and archived or printed for justifying purchase decisions or showing before-and-after comparisons after adjustments have been made to either hosts or virtual machines.

Master It Creating graphs for hosts and virtual machines.

Master It Using esxtop to monitor resources on a single ESX host.

Customizing Host and Virtual Machine Graphs for CPU, Memory, Disk, and Network Adapters.

Master It Use the graphs to monitor CPU usage regularly for hosts and the virtual machines.

Master It Create graphs showing host memory usage using the various objects, counters, and chart types.

Master It Create graphs showing virtual machine memory usage using the various objects, counters, and chart types.

Master It Create graphs showing host and virtual machine network usage using the various objects, counters, and chart types.

Master It Create graphs showing host and virtual machine usage using the various objects, counters, and chart types.

Save a graph Graphs can be saved as evidence, I mean justification, as to why new hardware is required.

Master It You need to provide a graph to upper management in support of your proposal for two new servers to be configured as ESX Server hosts.

Загрузка...