Wednesday, 5 August 2015

Tuning hypervisors for High Throughput Computing

Over the past set of blogs, we've looked at a number of different options for tuning High Energy Physics workloads in a KVM environment such as the CERN OpenStack cloud.

This is a summary of the findings using the HEPSpec 06 benchmark on KVM and a comparison with Hyper-V for the same workload.

For KVM on this workload, we saw a degradation in performance on large VMs.

Results for other applications may vary so each option should be verified for the target environment. The percentages from our optimisations are not necessarily additive but give an indication of the performance improvements to be expected. After tuning, we saw around 5% overhead from the following improvements.

CPU topology~0The primary focus for this function was not for performance so result is as expected
Host Model4.1-5.2%Some impacts on operations such as live migration
Turn EPT off6%Open bug report for CentOS 7 guest on CentOS 7 hypervisor
Turn KSM off0.9%May lead to an increase in memory usage
NUMA in guest~9%Needs Kilo or later to generate this with OpenStack
CPU Pinning~3%Needs Kilo or later (cumulative on top of NUMA)

Different applications will see a different range of improvements (or even that some of these options degrade performance). Experiences from other workload tuning would be welcome.

One of the things that led us to focus on KVM tuning was the comparison with Hyper-V. At CERN, we made an early decision to run a multi-hypervisor cloud building on the work by and Puppet on Windows to share the deployment scripts for both CentOS and Windows hypervisors. This allows us to direct appropriate workloads to the best hypervisor for the job.

One of the tests when we saw a significant overhead on the default KVM configuration was to compare the performance overheads for a Linux configuration on Hyper-V. Interestingly, Hyper-V achieved better performance without tuning compared to the configurations with KVM. Equivalent tests on Hyper-V showed
  • 4 VMs 8 cores: 0.8% overhead compared to bare metal 
  • 1 VM 32 cores: 3.3% overhead compared to bare metal
These performance results allowed us to focus on the potential areas for optimisation, that we needed to tune the hypervisor rather than a fundamental problem with virtualisation (with the results above for NUMA and CPU pinning)

The Hyper-V configuration pins each core to the underlying  NUMA socket which is similar to how the Kilo NUMA tuning sets KVM up.


This gives the Linux guest configuration as seen from the guest running on a Hyper-V hypervisor

# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 28999 MB
node 0 free: 27902 MB
node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 29000 MB
node 1 free: 28027 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

Thanks to the QEMU discuss mailing list and to the other team members who helped understand the issue (Sean Crosby (University of Melbourne) and Arne Wiebalck, Sebastian Bukowiec and Ulrich Schwickerath (CERN))