Step 1, 2 & 6 only applies to hosted products, but not ESX Server. The other steps are regarding hw or guest environment and will also apply to ESX Server.
Hosted products = VMware Workstation, Server, Player, Ace, Fusion
ESX = ESX, ESXi, vSphere Hypervisor (in the context of this page)
Step 2 will give each VM more memory, so don't assign more memory to your guests than is available on the host. Overcommitting memory is possible, but is normally only recommended on ESX. Swapping will make things go very slow.
1. Disable any power saving functions of the host OS, and BIOS. <- More important on hosted products than on ESX
- This includes speedstep, powernow, cool'n'quiet and similar
- In the BIOS, choose Maximum performance
- After doing these steps your cpu speed should be non-changing (and power usage will be higher)
2. A few settings should be done in the global config file or individual VM's (.vmx) config file to bring better performance. <- not needed on ESX
- The global config file is usually found in /etc/vmware/config or c:\Documents and Settings\All Users\Application Data\VMware\VMware <product name>\config.ini
- The important lines are these:
sched.mem.pshare.enable = "FALSE"
mainMem.useNamedFile = "FALSE"
prefvmx.minVmMemPct = "100"
- By disabling MemTrimRate, memory allocation inside the guest is faster because it doesn't take and give memory to the host os upon all requests.
- By disabling memory sharing (sched.mem.pshare.enable) your guests will not share common memory blocks. Your VMware product will also stop comparing memory blocks.
- When allocating memory you VMware will store parts of the memory in a file. This file will be equally large as the memory allocated to the guest VM. This file exists because the ram allocation method used is mmap. By changing the setting for mainMem.useNamedFile, it will move this file from the VM's default location to /tmp on linux or into the swap file on windows. This will help a bit, especially if this is on a different disk than the VM. In linux it will help if you use the tmpfs file system for /tmp (or ramfs if you can afford it) (details here)
3. A fast physical disk subsystem is important.<- All products
- After memory, the disk is the most common bottleneck. On a physical host the disk is accessed by one host only, now there is a load from several servers on the disk system.
- Avoid using virtual ide drives inside your guest VMs. They are slower and will put more cpu load on the system than virtual scsi drives.
- Using seperate disks for OS and VMs will do you good.
- A good controller will do you even better. If you plan to run more than a few VMs on your server, a disk controller with battery and write cache will help a great deal for the performance. If your system is connected to an UPS you might also benefit from enabling caching on the individual disks (will not give much difference in performance, but lower cpu usage).
- Avoid using software raids. Even though the performance today is ok when using software raids on normal servers, it will also put extra load on the system. You will need as much performance from your storage as possible as there is also a virtualization overhead present.
- Use native drivers from the hw vendor in the host os (and vmware tools inside the guest). Firmware on controllers and disks might sometimes also have an effect.
- Use preallocated disks to avoid fragmentation.
- Snapshots have a negative effect on your storage performance. It also affects the other VMs on the same LUN.
- ESX: Storage devices that support VAAI will speed up certain operations by magnitude
- ESX: Plan your SAN setup carefully. Make sure you don't have too many VMs per LUN. Also make sure to load balance your traffic between your SAN controllers. If performance is bad, use your SAN tools to check the cache hit ratio. VMware have a SAN design guide with good recommendations.
- ESX: Do not connect too many ESX servers to the same SAN controller. It will affect the latency.
- ESX: Fiberchannel normally gives better IO than iSCSI/NFS (details), but the use og 10GbE is making this a smaller issue than it used to be
- ESX: Use LBT (Load Based Teaming) whenever possible.
4. Many linux kernels are not tuned for acting as a guest (details here) <- All products
- In 2.4 kernels the system timer was normally clocked at 100 Hz, while in 2.6 the default system timer is set to 1000 Hz (some other distros are not following these "rules", but USER_HZ is always set to 100 to not break compatibility with existing tools). 1000 Hz is definately a good thing for physical desktop computers, but it has bad side effects when virtualized. You will typically see that the load of an idle VM is higher than expected and that the clock inside the guest is not working correctly.
- The solution is to recompile the kernel to 100Hz or (on recent RHEL based distros) to use the tick divider boot option "divider=10".
- Using the kernel parameters "nosmp noapic nolapic" could also have a positive effect (don't use nosmp if your guest os has more than 1 vcpu)
- For 32bit linux guests that have paravirt_ops in the kernel (2.6.22 and newer) you should enable VMI (paravirtualization, more important on non RVI/EPT cpu systems).
5. Unicpu VMs might give best performance <- All products. Higher impact on hosted products than on ESX.
- Multi cpu guests adds extra overhead to virtualization and should only be used after your testing shows that a single cpu wasn't enough. If you have only two cores in total in your host you should never give any of your guests more than a single cpu. On ESX having a few vsmp enabled VMs may not hurt so badly, but once the system gets heavy load the negative effect is much clearer. 4 vcpu VMs are also much heavier affected than 2 vcpu VMs. (ESX2, Workstation & Server -> details here , ESX3.x -> details here, ESX4 -> details here)
- It's always a good practice to give your guest VMs a single cpu first and add more as needed later.
- If you install a Windows 2003 guest with two vcpu's allocated you'll have a harder time reverting the HAL back to a single cpu HAL than if you started with a single vcpu. (details here)
6. Swap <- Hosted products
- Since hosted VMware products will always use swap if present, you might benefit from disabling it. On linux, VMware will use separate files (so it wont help anything), while on windows it will use the systems swap file if you use the mainMem.useNamedFile option as suggested in point 2 above. If you are 100% sure you have more than enough ram in your system and you are a brave fellow, you might benefit from disabling the windows page file. This is however a very unsupported solution and I wouldnt recommend it for other than experimental usage.
7. Virtual scsi controller & disks <- All products
- Using an LSILogic disk controller will normally give better performance than using the Buslogic controller (details here). A virtual IDE controller will cause higher cpu load in the VM and is slower than both the virtual scsi controllers.
- Using the paravirtual disk controller available in vSphere 4.0 gives the best performance at the lowest cpu load (details here) if you're using a SAN instead of local storage. Also note that you you need to use ESX 4u1 in order to use this disk controller on the system disk. Note that you can't use this driver if you're using VMware Fault Tolerance (FT).
- Using preallocated disks is faster than growable disks.
- Use eagerZeroedThick virtual disks for best performance (details). This is not the default setting, but disks can be converted.
8. Install VMware Tools <- All products
- ..even if it's a text console only VM.
9. Use a cpu that supports MMU virtualization <- All products
- CPUs that support MMU virtualization will give better performance for workloads that are MMU intensive (details here). Currently, this is supported on AMD cpus codenamed Barcelona, Budapest, Shanghai and Istanbul and Intel 55xx/56xx cpus. AMD has named this feature RVI, but also refers to it as nested paging. On Intel cpus this feature is known as EPT (extended page tables). This feature is supported in Workstation 6.5, Server 2.0, ESX 3.5 (only AMD) and ESX 4.0 and newer.
10. Remove limits <- ESX
- Having limits unknowingly applied may be hurting performance quite badly. CPU limits will cause the VMs to have a potential high ready time because the cpu will not be scheduled as much as normal. Limits on memory is however the worst offender since it will make the VM swap out memory to disk instead of using RAM. We have observed that such limits have been unknowingly applied on VMs that have been through several upgrades of ESX and at some point has been configured with more memory than they had initially. We have come to no conclusion to which versions that are causing this issue, but it has been observed at many customers.
VMware have released an excellent paper on performance troubleshooting here: http://communities.vmware.com/docs/DOC-14905
VMware also has a very good paper on performance tuning for Workstation 6 here.