This appendix serves as an overview of the many design, deployment, management, and monitoring concepts discussed throughout the book. It can be used as a quick reference for any phase of your virtual infrastructure deployment. The appendix is also meant as a review of the material we covered, with a focus on the concepts of VMware Infrastructure 3 (VI3) that are commonly discussed in the world of virtualization management. By reviewing the appendix, you can gauge your level of fluency with the concepts we've discussed. If you're unsure of any of the best practices outlined here, you can revisit the various sections of the book for more details about that particular best practice.
The following best practice suggestions are derived from the full details outlined in Chapters 2 and 13.
♦ Review your architecture needs to determine if ESX Server 3.5 or ESXi is the right foundation for your virtual infrastructure. Identify the answers to questions like:
♦ Do I have a need for the console operating system?
♦ Do I have a need to minimize the footprint on the physical server?
♦ Do I want to install any third-party applications that require the service console?
♦ Always consult the ESX Server compatibility guides before purchasing any new hardware. Even if you are successful at installing on unsupported hardware, be aware that using hardware not listed on the compatibility guides will force any support calls to end abruptly. Ensure that you review the appropriate compatibility guide for the product you have chosen to install.
♦ Plan the Service Console management methods before installing. Identify the answers to questions like:
♦ Will the Service Console be on a dedicated management network or on the same network as virtual machines?
♦ Will I be using VLANs or physical hardware to segment the Service Console?
♦ How will I provide redundancy for the Service Console communication?
♦ If you're installing ESX Server 3.5, construct a Service Console security plan. Ensure limited access to the Service Console by minimizing the number of administrators with local user accounts or knowledge of the root password.
♦ Create user accounts for each administrative user who requires direct access to an ESX Server host.
♦ Establish strong user account policies in the Service Console by integrating ESX Server with Active Directory or by deploying a Linux-based security module.
♦ Establish growth projections and plan the ESX Server partition strategy accordingly.
♦ Increase /root partition size to provide ample room for growth and/or the installation of third-party applications. If the root of the file system runs out of space, there will most certainly be issues to address.
♦ Increase /swap partition size to address any projected increases in the amount of RAM provided to the Service Console. The /swap should be twice the amount of RAM that will be allocated to the Service Console.
♦ Change /var/log to /var and increase partition size to provide ample room for logs and the ESX Server patch management process that writes to the /var directory during the process.
♦ Unless performing a boot from SAN, detach ESX Server hosts from the external storage devices to prevent overwriting existing data. At minimum, don't present LUNs to a new ESX Server host until the installation is complete.
♦ When reinstalling ESX Server on a physical server, be careful not to initiate LUNs with existing data. Once again, disconnecting a host from the SAN during the reinstall process will eliminate the threat of erasing data.
♦ Configure a time synchronization strategy that synchronizes all ESX Server hosts with the same external time server.
♦ Ensure the security of console access by guaranteeing the physical security of the box. If the server is configured with a remote console adapter, like the Dell Remote Access Controller (DRAC), ensure the default password has been changed and that the DRAC network is not readily accessible to other network segments.
The configuration details regarding the virtual networking best practices shown here can be found in Chapter 3.
♦ Plan the virtual-to-physical networking integration.
♦ Maximize the number of physical network adapters (Ethernet ports) to provide flexibility in the virtual networking architecture.
♦ Separate Service Console, iSCSI, NAS, VMotion, and virtual machine traffic across different physical networks pending the availability of network adapters or use a VLAN architecture to segment the traffic.
♦ Create virtual switches with VLAN IDs to provide security, segmentation, and scalability to the virtual switching architecture.
♦ Construct a virtual networking security policy for virtual switches, ports, and port groups.
♦ Create port groups for security, traffic shaping, or VLAN tagging.
♦ For optimal security, configure the virtual switch properties with the following settings:
♦ Promiscuous mode: Reject
♦ MAC Address Changes: Reject
♦ Forged Transmits: Reject
♦ Avoid VLAN tags used by common third-party hardware devices, like VLAN0. Virtual switches do not support the native VLAN as physical switches do.
♦ Define traffic shaping to reduce the outbound bandwidth available either to the virtual machines that do not require full access to the bandwidth of the physical adapter or to the virtual machines that inappropriately monopolize bandwidth. Weigh the options of micro-managing virtual machine bandwidth against the configuration of NIC teams with the installation of additional network adapters.
♦ Construct NIC teams on a physical adapter connected to separate bus architectures. For example, use one onboard network adapter in combination with an adapter from an expansion card. Do not use two adapters from the same expansion card in the same NIC team or two onboard adapters in the same NIC team.
♦ To eliminate a single point of failure at the physical switch, connect network adapters in a NIC team to separate physical switches that belong to the same broadcast domain.
♦ Consider creating a NIC team for the service console. Otherwise, consider providing multiple vswif ports on different networks for redundant Service Console access.
♦ Construct a dedicated Gigabit LAN for VMotion. Ideally, all physical network adapters in the server offer gigabit speeds.
♦ Create separate networks for test and production virtual machines.
The configuration details regarding the virtual networking best practices shown here can be found in Chapter 4.
♦ When booting from SAN, mask each bootable LUN to be seen only by the ESX Server booting from that LUN.
♦ Build a dedicated and isolated storage network for iSCSI SAN storage to isolate and secure iSCSI storage-related traffic.
♦ Build a dedicated and isolated storage network for NAS/NFS storage to isolate and secure NAS/NFS storage-related traffic.
♦ Perform all masking at the storage device, not at the ESX Server host.
♦ Separate disk-intensive virtual machines on different LUNs carved from separate physical disks.
♦ Provide individual zoning configurations for each ESX Server host.
♦ Allow the SAN administrators to manage LUN sizes. VMFS extents might help immediate needs, but might lead to loss of data in the event that an extent becomes corrupted or damaged.
♦ Spread the storage communication workload across the available hardware devices. For example, if the ESX Server host has two fibre channel adapters, ensure that the VMkernel is not sending all traffic through one adapter while the other remains dormant.
♦ Use separate storage locations for test virtual machines and production virtual machines.
♦ Build LUNs in sizes that are easy to manage yet can host multiple virtual machines. For example create 300GB or 400GB LUNs to host 5 or 6 virtual machines. Be prepared to use storage VMotion to move disk intensive virtual machines.
♦ Use storage VMotion to eliminate down time when needing to migrate a virtual machine between datastores.
♦ Use Raw Device Mappings (RDMs) for Microsoft Clustering scenarios or to provide virtual machines with access to existing LUNs that contain data on NTFS formatted storage.
♦ Implement a solid change management practice for the deployment of new LUNs.Identify a standard sized LUN and stray from the standard only when the situation calls for it.
The configuration details regarding the virtual networking best practices shown here can be found in Chapters 5 and 8.
♦ Uninstall IIS prior to installing VirtualCenter Server.
♦ Use the Service applet in the Windows Control Panel to configure the VMware VirtualCenter Server Service for autorestart.
♦ Design a strong high availability solution for the VirtualCenter database server (i.e., Microsoft Clustering or consistent database backups).
♦ To install VirtualCenter 2.5 with a SQL Server 2005 back-end database requires a SQL Server authenticated user account with membership in the db_owner database role and ownership of the VirtualCenter database. Once the installation of VC 2.5 is complete, the db_owner database role membership can (and should) be removed.
♦ Carefully monitor the transaction logs of the VirtualCenter database. To eliminate transaction log growth, configure SQL Server databases in Simple Recovery mode. For maximum recoverability, configure SQL Server database in Full Recovery mode.
♦ Configure VirtualCenter in an active/passive server cluster with Microsoft Clustering Services for high availability, or install VirtualCenter into a virtual machine and perform a copy of the virtual machine at regular intervals.
♦ Create a VirtualCenter hierarchy to support your management model. If your organization manages resources by location, then create management objects (datacenters, clusters, folders) based on location. On the other hand, if your organization manages by department, then create objects accordingly. In most organizations the VirtualCenter hierarchy will reflect a hybrid approach that combines location, department, server function, server type, and so forth.
♦ Apply the principle of least privilege to permissions assignment policies in VirtualCenter. Employees who use VirtualCenter as a common management tool should be granted only the permissions required to perform their job.
♦ Use Windows groups in the VirtualCenter security model. Assigning Windows groups to a VirtualCenter role that is assigned privileges and permissions will facilitate the application of similar settings in the future. For example, create a Windows group called DomainControllerAdmins that is a member of the VC role called DCAdmins, which has the privilege to power on and power off and has been granted the permission on a folder containing all domain controller virtual machines. When a new user is hired to administer the domain controller virtual machines, the user can simply be added to the DomainControllerAdmins Windows group and will inherit all the necessary permissions.
♦ Identify a systematic approach to LUN creation and management. Identify either the adaptive or the predictive scheme as the LUN management process. Keep in mind that your overall storage management may involve a combination of larger LUNs with several virtual machine files and smaller LUNs for individual virtual machine files.
♦ Configure DRS to perform VMotion based on comfort level. Some VMotion will be necessary to ensure balance and fairness of resource allocation.
♦ Disable the automated VMotion for critical virtual machines that you do not wish to be VMotion candidates based on the DRS algorithm.
♦ If the DRS algorithm suggests a VMotion migration of four or five stars, it is in the best interest of the system to apply the recommendation. The algorithm takes into account many factors for offering recommendations that result in increased system performance.
The configuration details regarding the virtual networking best practices shown here can be found in Chapters 6 and 7.
♦ Construct virtual machines with separate drives for operating systems and user data. Place each of the virtual SCSI hard drives on separate virtual SCSI adapters.
♦ Always install VMware Tools to provide the optimized SCSI drivers, enhanced virtual NIC drives, and support for quiescing the file system during the VMware snapshot process.
♦ Use the VMware tools to complement the Windows Time Services to synchronize the time on a virtual machine. The Windows server functioning as the PDC emulator operations master should be configured to synchronize time with the same time server used by the ESX Server hosts.
♦ Avoid special characters and spaces in the display names of virtual machines. Create virtual machine display names with the same rules you apply when providing DNS hostnames.
♦ During a physical-to-virtual migration, adjust the size of the hard drives to prevent excess storage consumption of the target datastore.
♦ After a physical-to-virtual migration, reduce the amount of memory to a more appropriate level. In most physical server environments, the amount of RAM is drastically overallocated. In virtual environments, resource allocation must be carefully considered.
♦ After a physical-to-virtual migration, reduce the number of CPUs to one. Increase only as needed by the virtual machine. Additional virtual CPUs can cause unwanted contention with the scheduling of multiple vCPUs onto pCPUs. The number of vCPUs in a virtual machine should be less than the number of pCPUs in the server to prevent the virtual machine from consuming all pCPUs.
♦ Maintain virtual machine templates for several different operating system installations. For example, create and maintain templates for Windows Server 2003, Windows Server 2003 Service Pack 1, Windows Server 2003 Service Pack 2, Windows Server 2008, and so forth.
♦ When templates are brought online, place them onto isolated networks away from access by standard end users.
♦ Use CPU and memory reservations to guarantee resources to critical virtual machines and use share values to guarantee appropriate resources to critical virtual machines during periods of increased contention.
The configuration details regarding the virtual networking best practices shown here can be found in Chapter 10.
♦ Implement Microsoft Clustering Services to achieve high availability of individual virtual machines. Note that versions of ESX Server prior to 3.5 were certified for support of Microsoft server clusters in virtual machines. As of this writing, the recertification process for clustering in a virtual machine was not complete. Please refer to the VMware website for updated information about supported technologies.
♦ Implement VMware High Availability (HA) to provide automatic restart of virtual machines residing on an ESX Server that fails.
♦ Use strict admission control for HA clusters unless virtual machine performance is not as important as simply having the virtual machine powered on.
♦ Prioritize virtual machines for startup after server failure. Prepare a contingency plan for powering off unnecessary virtual machines in the event of server failure, resulting in reduced computing power.
♦ Implement a backup strategy that involves a blend of full virtual machine backups with file level backups.
♦ Purchase enough backup agents to ensure minimal recovery times for servers with critical production data. Schedule the backups to ensure that recovery times are appropriate for the data type. For example, for data with greater value and a requirement for quicker restore, backups should be scheduled more often than usual.
♦ Do not use virtual machine snapshots as long-term solutions to disaster recovery or business continuity. Snapshots are meant as a temporary means of providing an easy rollback feature and are used primarily for short term recovery purposes.
♦ Back up data as often as needed as determined by a written business continuity/disaster recovery plan. More critical data should be backed up more often to prevent less data loss in the event of disaster.
♦ Test the full and virtual machine backups regularly by restoring to a test or development network.
♦ Store a copy of virtual machine backups in an off-site location. Otherwise, use tools to perform virtual machine replication to distant datacenters. Virtualization offers significant advantages in the realm of disaster recovery because virtual machines are encapsulated into a discrete set of files.
♦ Purchase licenses for Windows Server 2003 Datacenter to achieve a greater return on investment and achieve less stringent VMotion restrictions. Datacenter licenses allow for the installation of an unlimited number of virtual machines per ESX Server host.
The following best practices will help you troubleshoot a problematic VI3 deployment.
♦ Monitor virtual machine performance with a combination of tools inside the virtual machine and tools in VI3. For example, use Task Manager inside of a virtual machine and the performance reports from VirtualCenter to monitor CPU utilization and to identify bottlenecks.
♦ Regularly review the levels of CPUReady and Ballooning in the performance charts provided by VirtualCenter. Abnormally high values of either counter would indicate an issue with CPU or memory, respectively.
♦ Create virtual machine benchmarks as a standard of comparison when changes are made.
♦ Create e-mail-based performance alarms for key virtual machines. Allow administrators to be notified of system problems for virtual machines that provide core network services such as mail, databases, and authentication.
♦ Identify the root of any problem, then attempt fixes based on monitoring results, feature dependencies, and the company's documented change management process. For example, if VMware HA is not failing over properly, review the DNS configuration for the affected hosts since HA relies on name resolution across ESX Server hosts.
♦ Engage in a systematic approach to identifying and fixing problems with ESX Server hosts and virtual machines.