LHC Tunnel

LHC Tunnel

Wednesday 21 February 2018

Maximizing resource utilization with Preemptible Instances

Motivation


The CERN cloud consists of around 8,500 hypervisors providing over 36,000
virtual machines. These provide the compute resources for both the laboratory's
physics program but also for the organisation's administrative operations such
as paying bills and reserving rooms at the hostel.

The resources themselves are generally ordered once to twice a year with servers being kept for around 5 years. Within the CERN budget, the resource planning teams looks at:
  • The resources required to run the computing services requirements for the CERN laboratory. These are projected using capacity planning trend data and upcoming projects such as video conferencing.
With the installation and commissioning of thousands of servers concurrently
(along with their associated decommissioning 5 years later), there are scenarios
to exploit underutilised servers. Programs such as LHC@Home are used but we have also been interested to expand the cloud to provide virtual machine instances which can be rapidly terminated in the event of
  • Resources being required for IT services as they scale out for events such as a large scale web cast on a popular topic or to provision instances for a new version of an application.
  • Partially full hypervisors where the last remaining cores are not being requested (the Tetris problem).
  • Compute servers at the end of their lifetime which are used to the full before being removed from the computer centre to make room for new deliveries which are more efficient and in warranty.
The characteristics of this workload is that it should be possible to stop an
instance within a short time (a few minutes) compared to a traditional physics job.

Resource Management In Openstack

Operators use project quotas for ensuring the fair sharing of their infrastructure. The problem with this, is that quotas pose as hard limits.This
leads to actually dedicating resources for workloads even if they are not used
all the time or to situations where resources are not available even though
there is quota still to use.

At the same time, the demand for cloud resources is increasing rapidly. Since
there is no cloud with infinite capabilities, operators need a way to optimize
the resource utilization before proceeding to the expansion of their infrastructure.

Resources in idle state can occur, showing lower cloud utilization than the full
potential of the acquired equipment while the users’ requirements are growing.

The concept of Preemptible Instances can be the solution to this problem. These
type of servers can be spawned on top of the project's quota, making use of the
underutilised  capabilities. When the resources are requested by tasks with
higher priority (such as approved quota), the preemptible instances are
terminated to make space for the new VM.

Preemptible Instances with Openstack

Supporting preemptible instances, would mirror the AWS Spot Market and the
Google Preemptible Instances. There are multiple things to be addressed here as
part of an implementation with OpenStack, but the most important can be reduced to these:
  1. Tagging Servers as Preemptible
In order to be able to distinguish between preemptible and non-preemptible
servers, there is the need to tag the instances at creation time. This property
should be immutable for the lifetime of the servers.
  1. Who gets to use preemptible instances
There is also the need to limit which user/project is allowed to use preemptible
instances. An operator should be able to choose which users are allowed to spawn this type of VMs.
  1. Selecting servers to be terminated
Considering that the preemptible instances can be scattered across the different cells/availability zones/aggregates, there has to be “someone” able to find the existing instances, decide the way to free up the requested resources according to the operator’s needs and, finally, terminate the appropriate VMs.
  1. Quota on top of project’s quota
In order to avoid possible misuse, there could to be a way to control the amount of preemptible resources that each user/project can use. This means that apart from the quota for the standard resource classes, there could be a way to enforce quotas on the preemptible resources too.

OPIE : IFCA and Indigo Dataclouds

In 2014, there were the first investigations into approaches by Alvaro Lopez
from IFCA (https://blueprints.launchpad.net/nova/+spec/preemptible-instances).
As part of the EU Indigo Datacloud project, this led to the development of the
OpenStack Pre-Emptible Instances package (https://github.com/indigo-dc/opie).
This was written up in a paper to Journal of Physics: Conference Series
(http://iopscience.iop.org/article/10.1088/1742-6596/898/9/092010/pdf) and
presented at the OpenStack summit (https://www.youtube.com/watch?v=eo5tQ1s9ZxM)

Prototype Reaper Service

At the OpenStack Forum during a recent OpenStack summit, a detailed discussion took place on how spot instances could be implemented without significant changes to Nova. The ideas were then followed up with the OpenStack Scientific Special Interest Group.

Trying to address the different aspects of the problem, we are currently
prototyping a “Reaper” service. This service acts as an orchestrator for
preemptible instances. It’s sole purpose is to decide the way to free up the
preemptible resources when they are requested for another task.

The reason for implementing this prototype, is mainly to help us identify
possible changes that are needed in Nova codebase to support Preemptible
Instances.

More on this WIP can be found here: 

Summary

The concept of Preemptible Instances gives operators the ability to provide a
more "elastic" capacity. At the same time, it enables the handling of increased
demand for resources, with the same infrastructure, by maximizing the cloud
utilization.

This type of servers is perfect for tasks/apps that can be terminated at any
time, enabling the users to take advantage of extra cpu power on demand without the fixed limits that quotas enforce.

Finally, here in CERN, there is an ongoing effort to provide a prototype
orchestrator for Preemptible Servers with Openstack, in order to pinpoint the
changes needed in Nova to support this feature optimally. This could also be
available in future for other OpenStack clouds in use by CERN such as the
T-Systems Open Telekom Cloud through the Helix Nebula Open Science Cloud
project.

Contributors

  • Theodoros Tsioutsias (CERN openlab fellow working on Huawei collaboration)
  • Spyridon Trigazis (CERN)
  • Belmiro Moreira (CERN)

References

16 comments:

  1. Thank you so much for this very useful insight!

    Openstack Training

    ReplyDelete
  2. Montedo é o ministro da economia brasileiro e os nossos carros já estão com o IPVA SP pagos e o Licenciamento anual em dia. E o nosso Coluna do Flamengo ? cdf

    ReplyDelete
  3. Erectile dysfunction is a condition where a man is not able to get an erection. Even if they are able to get an erection, it does not last for too long. Suffering from erectile dysfunction can affect the person both physically and mentally. Therefore a person needs to take medical help to make the condition better. Also suffering from erectile dysfunction can affect the relation of the person with their partners. The medication that has brought about a wave of change in this field is the use of Viagra for erectile dysfunction. It works by targeting the basic cause of the issue thus helping millions of men all across the world. If you are a man who has been facing an issue with getting and maintaining an erection for a long time now, then you should
    .Buy Viagra online

    ReplyDelete

  4. I'm very happy to search out this information processing system. I would like to thank you for this fantastic read!!
    Openstack Training
    Openstack Training Online
    Openstack Training in Hyderabad

    ReplyDelete
  5. I have similar musings on quite a bit of this material. I am happy I'm by all account not the only individual who thinks along these lines. You have truly composed a superb quality article here. Much thanks.


    SEO services in kolkata
    Best SEO services in kolkata
    SEO company in kolkata
    Best SEO company in kolkata
    Top SEO company in kolkata
    Top SEO services in kolkata
    SEO services in India
    SEO copmany in India

    ReplyDelete
  6. Hello, I want to have one of these piercing models, but I am unsure. Can you look at it?

    Double Helix Piercing

    Snug Piercing

    Snake Eyes Piercing

    Tragus Piercing

    Corset Piercing

    Tragus Piercing & Piercing

    ReplyDelete
  7. I have been reading out many of your articles and it’s clever stuff. I will make sure to bookmark your blog.
    토토
    경마
    온라인경마

    ReplyDelete
  8. I have been absent for a while, but now I remember why I used to love this web site. Thank you, I will try and check back more frequently. How frequently you update your web site?
    majortotosite
    racesite
    oncasinosite
    totopick

    ReplyDelete
  9. Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing.Nice article i was really impressed by seeing this article, it was very interesting and it is very useful for me..

    python internship | web development internship |internship for mechanical engineering students |mechanical engineering internships |java training in chennai |internship for 1st year engineering students |online internships for cse students |online internship for engineering students |internship for ece students |data science internships

    ReplyDelete
  10. Specialized abilities likewise incorporate a comprehension of such innovations as SQL and test-driven improvement. It is additionally vital to be know about various conventions like ERC20 or ERC721 as well as other metaverse programming advancement administrations. One doesn't just have to figure out them yet in addition have the option to work with them and backing the foundation at all degrees of advancement.

    Other than a few fundamental specialized abilities, a strength engineer likewise has to know how to function with network safety. It is a fundamental prerequisite for any business, particularly one that works with cryptographic money. It furnishes clients with high protection of their information as well as their moving>> solidity developer salary

    ReplyDelete