Monday, 23 March 2015

Not all cores are created equal

Within CERN's compute cloud, the hypervisors vary significantly in performance. We generally run the servers for around 5 years before retirement and there are around 3 different configurations selected each year through public procurement.

Benchmarking in High Energy Physics is done using a benchmark suite called HEPSpec 2006 (HS06). This is based on the C++ programs within the Spec 2006 suite run in parallel according the number of cores in the server. The performance range is around a factor of 3 between the slowest and the fastest machines [1].  


When machines are evaluated after delivery, the HS06 rating for each hardware configuration is saved into a hardware inventory database.

Defining a flavor for each hardware type was not attractive as there are 15 different configurations to consider and users would not easily find out which flavors have free cores. Instead, users ask for the standard flavors, such as m1.small 1 core virtual machine, you could land on a hypervisor giving 6 HS06 or one giving 16. However, the accounting and quotas is done using virtual cores so the 6 and 16 HS06 virtual cores are considered equivalent.

in order to improve our accounting, we therefore wanted to provide the performance of the VM along with the metering records giving the CPU usage through ceilometer. Initially, we thought that this would require some additional code to be added to ceilometer but this is actually possible using the standard ceilometer functions with transformers and publishers.


The following approach was implemented.
  • On the hypervisor, we added an additional meter 'hs06' which provides the CPU rating of the VM normalised by the HS06 performance of the hypervisor. This value is determined using the HS06 value stored in the Hardware Database which can be provided to the hypervisor via a Puppet Fact.
  • This data is stored, in addition to the default 'cpu' record in ceilometer
The benefits of this approach are
  • There is no need for external lookup to the hardware database to process the accounting
  • No additional rights for the accounting process is required (such as to read the mapping between VM and hypervisor
  • Scenarios such as live migration of VMs from one hypervisor to another of different HS06 are correctly handled
  • No modifications to the ceilometer upstream code are required which both improves deployment time and does not invalidate upstream testing
  • Multiple benchmarks can be run concurrently. This allows a smooth migration from HS06 to a following benchmark HS14 by providing both sets of data.
  • Standard ceilometer behaviour is not modified so existing programs such as Heat which use this data can continue to run
  • This assumes no overcommitment of CPU. Further enhancements to the configuration would be possible in this area but this would require further meters.
  • The information is calculated directly on the hypervisor so it is scalable and it is calculated inline which avoids race conditions when the virtual machine is deleted and therefore the mapping VM to HV is no longer available
The assumptions are
  • The accounting is based on the delivered clock ticks to the hypervisor. This will vary in cases where the hypervisor is running a more recent version of the operating system with a later compiler (and thus probably has a higher HS06 rating). Running older OS versions is therefore corresponding less efficient.
  • The cloud is running at least the Juno OpenStack release
To implement this feature, the pipeline capabilities of ceilometer are used. These are configured automatically by the puppet-ceilometer component into /etc/ceilometer/pipeline.yaml.
The changes required are in several blocks. In the sources section as indicated by
---
sources:
A further source needs to be defined to get the CPU metric available for transformation. This polls every 10 minutes (600 seconds) from the CPU meter and sends the data to the sink for the hs06
    - name: hs06_source
      interval: 600
      meters:
          - "cpu"
      sinks:
          - hs06_sink
The hs06_sink processing is defined later in the file in the sinks section
sinks:
The entry below takes the number of virtual cores of the VM and scales by 10 (which is the example HS06 CPU performance per core) and 0.98 (for the virtualisation overhead factor). It is reported in units of HS06s (i.e. HepSpec 2006). The value of 10 would be derived from the Puppet HS06 value for the machine divided by the number of cores in the server (from the Puppet fact processorcount). Puppet can be used to configure a hard-coded value per hypervisor that is delivered to the machine as a fact and used to generate the pipeline.yaml configuration file.
    - name: hs06_sink
      transformers:
          - name: "arithmetic"
            parameters:
                target:
                    name: "hs06"
                    unit: "HS06"
                    type: "gauge"
                    expr: "$(cpu).resource_metadata.vcpus*10*0.98"
      publishers:
          - notifier://
Once these changes have been done, the ceilometer daemons can be restarted to get the new configuration.
 service openstack-ceilometer-compute restart
If there are errors, these will be reported to /var/log/ceilometer/compute.log. These can be checked with
egrep "(ERROR|WARNING)" /var/log/ceilometer/compute.log
The first messages like "dropping sample with no predecessor" are to be expected as they are handling differences between the previous values and the current ones (such as cpu utilisation).
After 10 minutes or so, ceilometer will poll the CPU, generate the new hs06 value and this can be queried using the ceilometer CLI.
ceilometer meter-list | grep hs06
will include the hs06 meter
| hs06                                | cumulative | HS06        | c6af7651-5fc5-4d37-bf57-c85238ee098c         | 1cdd42569f894c83863e1b76e165a70c | c4b673a3bb084b828ab344a07fa40f54 |
| hs06                                | cumulative | HS06        | e607bece-d9df-4792-904a-3c4adca1b99c         | 1cdd42569f894c83863e1b76e165a70c | c4b673a3bb084b828ab344a07fa40f54 |
and the last 5 entries in the database can be retrieved
ceilometer sample-list -m hs06 -l 5
produces the output
+--------------------------------------+------+-------+--------+------+---------------------+
| Resource ID                          | Name | Type  | Volume | Unit | Timestamp           |
+--------------------------------------+------+-------+--------+------+---------------------+
| 1fa28676-b41c-4673-9d31-1fa83711725a | hs06 | gauge | 12.0   | HS06 | 2015-03-22T09:19:49 |
| 1fa28676-b41c-4673-9d31-1fa83711725a | hs06 | gauge | 12.0   | HS06 | 2015-03-22T09:16:49 |
| 1fa28676-b41c-4673-9d31-1fa83711725a | hs06 | gauge | 12.0   | HS06 | 2015-03-22T09:13:49 |
| 1fa28676-b41c-4673-9d31-1fa83711725a | hs06 | gauge | 12.0   | HS06 | 2015-03-22T09:10:49 |
| b812c69c-3c9f-4146-952e-078a266b11c5 | hs06 | gauge | 11.0   | HS06 | 2015-03-22T08:54:25 |
+--------------------------------------+------+-------+--------+------+---------------------+

References

  1. Ulrich Schwickerath - "VM benchmarking: update on CERN approach" http://indico.cern.ch/event/319819/session/1/contribution/7/material/slides/0.pdf
  2. Ceilometer architecture http://docs.openstack.org/developer/ceilometer/architecture.html
  3. Basic introduction to ceilometer using RDO - https://www.rdoproject.org/CeilometerQuickStart
  4. Ceilometer configuration guide for transformers http://docs.openstack.org/admin-guide-cloud/content/section_telemetry-pipeline-configuration.html
  5. Ceilometer arithmetic guide at https://github.com/openstack/ceilometer-specs/blob/master/specs/juno/arithmetic-transformer.rst