LHC Tunnel

LHC Tunnel

Sunday, 2 August 2015

CPU Model Selection for High Throughput Computing

As part of the work to tune the configuration of the CERN cloud, we have been exploring various options for tuning compute intensive workloads.

One option in the Nova configuration allows the model of CPU visible in the guest to be configured between different alternatives.

The choices are as follows
  • host passthrough provides an exact view of the underlying processor
  • host model provides a view of a processor model which is close to the underlying processor but gives the same view for several processors, e.g. a range of different frequencies within the same processor family
  • custom allows the administrator to provide a view selecting the exact characteristics of the processor
  • none gives the hypervisor default configuration
There are a number of factors to consider for this selection
  • Migration between hypervisors has to be done with the same processor in the guest. Thus, if host passthrough is configured and the VM is migrated to a new generation of servers with a different processor, this operation will fail.
  • Performance will vary with host passthrough being the fastest as the application can use the full feature set of the processor. The extended instructions available will vary as shown at the end of this article where different settings give different flags.
The exact performance impact will vary according to the application. High Energy Physics uses a benchmark suite HEPSpec06 which is a subset of the SPEC 2006 benchmarks. Using this combination, we observed around 4% reduction in performance of CPU bound applications using host model. Moving to the default was an overhead of 5%.


Given the significant differences, the CERN cloud is configured such that
  • hypervisors running compute intensive workloads are configured for maximum performance (passthrough). These workloads are generally easy to re-create so there is no need for migration between hypervisors (such as warranty replacement) but instead new instances can be created on the new hardware and the old instances deleted
  • hypervisors running services are configured with host model so that they can be migrated between generations of equipment and between hypervisors if required such as for an intervention
In the future, we would be interested in making this setting an option for VM creation such as meta data on the nova boot command or a specific property on an image so end users could choose the appropriate option for their workloads.

host-passthrough

# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
stepping        : 4
microcode       : 1
cpu MHz         : 2593.748
cache size      : 4096 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good unfair_spinlock pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips        : 5187.49
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

host-model

# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel Xeon E312xx (Sandy Bridge)
stepping        : 1
microcode       : 1
cpu MHz         : 2593.748
cache size      : 4096 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good unfair_spinlock pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips        : 5187.49
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

none

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 13
model name      : QEMU Virtual CPU version 1.5.3
stepping        : 3
microcode       : 1
cpu MHz         : 2593.748
cache size      : 4096 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 4
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good unfair_spinlock pni cx16 hypervisor lahf_lm
bogomips        : 5187.49
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

Previous blogs in this series are
  • CPU topology - http://openstack-in-production.blogspot.fr/2015/08/openstack-cpu-topology-for-high.html

Contributions from Ulrich Schwickerath and Arne Wiebalck have been included in this article.