With the accelerator stopping over the CERN annual closure until mid March, this is a good period to be planning reconfiguration of compute resources such as the migration of our central batch system which schedules the jobs across the central compute resources to a new system based on HTCondor. The compute resources are heavily used but there is more flexibility to drain some parts in the quieter periods of the year when there is not 10PB/month coming from the detectors. However, this year we have had an unexpected additional task to deploy the fixes for the Meltdown and Spectre exploits across the centre.
The CERN environment is based on Scientific Linux CERN 6 and CentOS 7. The hypervisors are now entirely CentOS 7 based with guests of a variety of operating systems including Windows flavors and CERNVM. The campaign to upgrade involved a number of steps
- Assess the security risk
- Evaluate the performance impact
- Test the upgrade procedure and stability
- Plan the upgrade campaign
- Communicate with the users
- Execute the campaign
Security Risk
The CERN environment consists of a mixture of different services, with thousands of projects on the cloud, distributed across two data centres in Geneva and Budapest.
Two major risks were identified
- Services which provided the ability for end users to run their own programs along with others sharing the same kernel. Examples of this are the public login services and batch farms. Public login services provide an interactive Linux environment for physicists to log into from around the world, prepare papers, develop and debug applications and submit jobs to the central batch farms. The batch farms themselves provide 1000s of worker nodes processing the data from CERN experiments by farming event after event to free compute resources. Both of these environments are multi-user and allow end users to compile their own programs and thus were rated as high risk for the Meltdown exploit.
- The hypervisors provide support for a variety of different types of virtual machines. Different areas of the cloud provide access to different network domains or to compute optimised configurations. Many of these hypervisors will have VMs owned by different end users and therefore can be exposed to the Spectre exploits, even if the performance is such that exploiting the problem would take significant computing time.
There are a variety of different hypervisor configurations which we split down by processor type (in view of the Spectre microcode patches). Each of these needs independent performance and stability checks.
Microcode
|
Assessment
|
#HVs
|
Processor name(s)
|
06-3f-02
|
covered
|
3332
|
E5-2630 v3 @ 2.40GHz,E5-2640 v3 @ 2.60GHz
|
06-4f-01
|
covered
|
2460
|
E5-2630 v4 @ 2.20GHz, E5-2650 v4 @ 2.20GHz
|
06-3e-04
|
hopefully
|
1706
|
E5-2650 v2 @ 2.60GHz
|
??
|
unclear
|
427
|
CPU family: 21 Model: 1 Model name: AMD Opteron(TM) Processor 6276
Stepping: 2
|
06-2d-07
| unclear |
333
|
E5-2630L 0 @ 2.00GHz, E5-2650 0 @ 2.00GHz
|
06-2c-02
|
unlikely
|
168
|
E5645 @ 2.40GHz, L5640 @ 2.27GHz, X5660 @ 2.80GHz
|
These risks were explained by the CERN security team to the end users in their regular blogs.
Evaluating the performance impact
The High Energy Physics community uses a suite called HEPSPEC06 to benchmark compute resources. These are synthetic programs based on the C++ components of SPEC CPU2006 which match the instruction mix of the typical physics programs. With this benchmark, we have started to re-benchmark (the majority of) the CPU models we have in the data centres, both on the physical hosts and on the guests. The measured performance loss across all architectures tested so far is about 2.5% in HEPSPEC06 (a number also confirmed by by one of the LHC experiments using their real workloads) with a few cases approaching 7%. So for our physics codes, the effect of patching seems measurable, but much smaller than many expected.
Test the upgrade procedure and stability
With our environment based on CentOS and Scientific Linux, the deployment of the updates for Meltdown and Spectre were dependent on the upstream availability of the patches. These could be broken down into several parts
- Firmware for the processors - the microcode_ctl packages provide additional patches to protect against some parts of Spectre. This package proved very dynamic as new processor firmware was being added on a regular basis and it was not always clear when this needed to be applied, the package version would increase but it was not always that this included an update for the particular hardware type. Following through the Intel release notes, there were combinations such as "HSX C0(06-3f-02:6f) 3a->3b" which explains that the processor description 06-3f-02:6f is upgraded from release 0x3a to 0x3b. The fields are the CPU family, model and stepping from /proc/cpuinfo and the firmware level can be found at /sys/devices/system/cpu/cpu0/microcode/version. A simple script (spectre-cpu-microcode-checker.sh) was made available to the end users so they could check their systems and this was also used by the administrators to validate the central IT services.
- For the operating system, we used a second script (spectre-meltdown-checker.sh) which was derived from the upstream github code at https://github.com/speed47/spectre-meltdown-checker. The team maintaining this package were very responsive incorporating our patches so that other sites could benefit from the combined analysis.
Communication with the users
For the cloud, there are several resource consumers.
- IT service administrators who provide higher level functions on top of the CERN cloud. Examples include file transfer services, information systems, web frameworks and experiment workload management systems. While some are in the IT department, others are representatives of their experiments or supporters for online control systems such as those used to manage the accelerator infrastructure.
- End users consume cloud resources by asking for virtual machines and using them as personal working environments. Typical cases would be a MacOS user who needs a Windows desktop where they would create a Windows VM and use protocols such as RDP to access it when required.
The communication approach was as follows:
- A meeting was held to discuss the risks of exploits, the status of the operating systems and the plan for deployment across the production facilities. With a Q&A session, the major concerns raised were around potential impact on performance and tuning options.
- An e-mail was sent to all owners of virtual machine resources informing them of the upcoming interventions.
- CERN management was informed of the risks and the plan for deployment.
CERN uses ServiceNow to provide a service desk for tickets and a status board of interventions and incidents. A single entry was used to communicate the current plans and status so that all cloud consumers could go to a single place for the latest information.
Execute the campaign
With the accelerator starting up again in March and the risk of the exploits, the approach taken was to complete the upgrades to the infrastructure in January, leaving February to find any residual problems and resolve them. As the handling of the compute/batch part of the infrastructure was relatively straight forward (with only one service on top), we will focus in the following on the more delicate part of hypervisors running services supporting several thousand users in their daily work.
The layout of our infrastructure with its availability zones (AVZs) determined the overall structure and timeline of the upgrade. With effectively four AVZs in our data centre in Geneva and two AVZs for our remote resources in Budapest, we scheduled the upgrade for the services part of the resources over four days.
The main zones in Geneva were done one per day, with a break after the first one (GVA-A) in case there were unexpected difficulties to handle on the infrastructure or on the application side. The remaining zones were scheduled on consecutive days (GVA-B and GVA-C), the smaller ones (critical, WIG-A, WIG-B) in sequential order on the last day. This way we upgraded around 400 hosts with 4,000 guests per day.
Within each zone, hypervisors were divided into 'reboot groups' which were restarted and checked before the next group was handled. These groups were determined by the OpenStack cells underlying the corresponding AVZs. Since some services required to limit the window of service downtime, their hosting servers were moved to the special Group 1, the only one for which we could give a precise start time.
For each group several steps were performed:
As some of the hypervisors in the cloud had very long uptimes and this was the first time we systematically rebooted the whole infrastructure since the service went to full production about five years ago, we were not quite sure what kind issues to expect -- and in particular at which scale. To our relief, the problems encountered on the hosts hit less than 1% of the servers and included (in descending order of appearance)
The layout of our infrastructure with its availability zones (AVZs) determined the overall structure and timeline of the upgrade. With effectively four AVZs in our data centre in Geneva and two AVZs for our remote resources in Budapest, we scheduled the upgrade for the services part of the resources over four days.
The main zones in Geneva were done one per day, with a break after the first one (GVA-A) in case there were unexpected difficulties to handle on the infrastructure or on the application side. The remaining zones were scheduled on consecutive days (GVA-B and GVA-C), the smaller ones (critical, WIG-A, WIG-B) in sequential order on the last day. This way we upgraded around 400 hosts with 4,000 guests per day.
Within each zone, hypervisors were divided into 'reboot groups' which were restarted and checked before the next group was handled. These groups were determined by the OpenStack cells underlying the corresponding AVZs. Since some services required to limit the window of service downtime, their hosting servers were moved to the special Group 1, the only one for which we could give a precise start time.
For each group several steps were performed:
- install all relevant packages
- check the next kernel is the desired one
- reset the BMC (needed for some specific hardware to prevent boot problems)
- log the nova and ping state of all guests
- stop all alarming
- stop nova
- shut down all instances via virsh
- reboot the hosts
- ... wait ... then fix hosts which did not come back
- check running kernel and vulnerability status on the rebooted hosts
- check and fix potential issues with the guests
As some of the hypervisors in the cloud had very long uptimes and this was the first time we systematically rebooted the whole infrastructure since the service went to full production about five years ago, we were not quite sure what kind issues to expect -- and in particular at which scale. To our relief, the problems encountered on the hosts hit less than 1% of the servers and included (in descending order of appearance)
- hosts stuck in shutdown (solved by IPMI reset)
- libvirtd stuck after reboot (solved by another reboot)
- hosts without network connectivity (solved by another reboot)
- hosts stuck in grub during boot (solved by reinstalling grub)
On the guest side, virtual machines were mostly ok when the underlying hypervisor was ok as well.
A few additional cases included
A few additional cases included
- incomplete kernel upgrades, so the root partition could not be found (solved by booting back into an older kernel and reinstall the desired kernel)
- file system issues (solved by running file system repairs)
So, despite initial worries, we hit no major issues when rebooting the whole CERN cloud infrastructure!
Conclusions
While these kind of security issues do not arrive very often, the key parts of the campaign follow standard steps, namely assessing the risk, planning the update, communicating with the user community, execution and handling incomplete updates.
Using cloud availability zones to schedule the deployment allowed users to easily understand when there would be an impact on their virtual machines and encourages good practise to load balance resources.
References
- CERN security advisory for Meltdown / Spectre
- CERN IT Status Board
Authors
- Arne Wiebalck
- Jan Van Eldik
- Tim Bell
Good post..Keep on sharing....
ReplyDeleteOpenstack Training
Openstack Certification Training
OpenStack Online Training
Openstack Training Course
Openstack Training in Hyderabad
Great article ...Thanks for your great information, the contents are quiet interesting. I will be waiting for your next post.
ReplyDeleteOpenstack Training
Openstack Certification Training
OpenStack Online Training
Openstack Training Course
Openstack Training in Hyderabad
ReplyDeleteI am looking for and I love to post a comment Python training in punethat "The content of your post is awesome" Great work!
ReplyDeleteHi to everybody, here everyone is sharing such knowledge, so it’s fastidious to see this site, and I used to visit this blog daily. ExcelR Data Science Courses
Nice Post and informative data. Thank you so much for sharing this good post, it was so nice to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeleteOpen Stack Training in Electronic City
Thank you sharing this Information
ReplyDeleteI also found Various useful links related to Devops, Docker & Kubernetes
Kubernetes Kubectl Commands CheatSheet
Introduction to Kubernetes Networking
Basic Concept of Kubernetes
Kubernetes Interview Question and Answers
Kubernetes Sheetsheat
Docker Basic Tutorial
Linux Sar Command Tutorial
Linux Interview Questions and Answers
Docker Interview Question and Answers
OpenStack Interview Questions and Answers
Very interesting information and very useful topic. I have information regarding data science course in Chennai.
ReplyDeletedata-science training
Data-Analytics course
business analytics -Python course
I wish that I could take an idea, research it like you and put it on paper in the same fashion that I have just read. Your ideas are fantastic.
ReplyDeleteDenial management software
Denials management software
Hospital denial management software
Self Pay Medicaid Insurance Discovery
Uninsured Medicaid Insurance Discovery
Medical billing Denial Management Software
Self Pay to Medicaid
Charity Care Software
Patient Payment Estimator
Underpayment Analyzer
Claim Status
ReplyDeleteI have recently visited your blog profile. I am totally impressed by your blogging skills and knowledge.
Data Science Course in Hyderabad
ReplyDeleteVery interesting blog Thank you for sharing such a nice and interesting blog and really very helpful article.
Data Science Course in Hyderabad
IT Company
ReplyDeleteIT Company
IT Company
IT Company
IT Company
IT Company
IT Company
IT Company
Randomly found your blog. You have share informative information. Thank You.
ReplyDeleteData science course in Mumbai
Data science course in Pune
Machine learning course in Pune
RPA training in Mumbai
Blockchain training in Mumbai
very informative article post. much thanks again
ReplyDeleteData Science Training in Hyderabad
Data Science course in Hyderabad
Data Science coaching in Hyderabad
Data Science Training institute in Hyderabad
Data Science institute in Hyderabad
I see the greatest contents on your blog and I extremely love reading them. ExcelR Data Science Courses
ReplyDeleteData scientist certification was never so easy and adaptable to everyone but here at Excelr We teach you numerous ways of doing Data Science Courses, which are way easy and interesting. Our experienced and expert faculty will help you reach your goal. 100% result oriented strategies are being performed; we offer Data Science Course in pune
ReplyDeleteData scientist certification
It's really nice and meanful. it's really cool blog. Linking is very useful thing.you have really helped lots of people who visit blog and provide them usefull information. seo gatineau
ReplyDeleteVery informative content and intresting blog post.Data science course in Thiruvananthapuram
ReplyDeleteI will really appreciate the writer's choice for choosing this excellent article appropriate to my matter.Here is deep description about the article matter which helped me more.
ReplyDeleteData Science Course in Mysore
Đặt vé máy bay tại Aivivu, tham khảo
ReplyDeletemua ve may bay tu han quoc ve viet nam
vé bay hải phòng sài gòn
vé máy bay hà nội philippin
lịch bay hải phòng nha trang
giá vé máy bay đi đà lạt khứ hồi
ReplyDeleteitubego-youtube-downloader-crack
torrent-pro-crack
screenhero-crack
davinci-resolve-crack
reallusion-faceware-profile-repack-crack
zorin-os-ultimate-crack
red-giant-universe-crack
advanced-system-repair-pro-crack
wondershare-uniconverter-crack
Great blog here with all of the valuable information you have. Keep up the good work you are doing here 경마
ReplyDeleteThis is my first time visit to your blog and I am very interested in the articles that you serve. Provide enough knowledge for me. Thank you for sharing useful and don't forget, keep sharing useful info:
ReplyDeleteTrucchi The Sims
انترنت داونلود مانجر مع الكراك
Kody Do The Sims
Triche The Sims 4
The Sims 4 Mod
The Sims 4 Cheats
Mody Do The Sims 4
This comment has been removed by the author.
ReplyDeleteLoved the content! You have mentioned every aspect of the subject through this article, can you please write about "low cost seo services for small business"?
ReplyDeleteThanks for sharing this marvelous post. I m very pleased to read this article.토토사이트
ReplyDeleteWay cool! Some extremely valid points! I appreciate you penning this write-up plus the rest of the site is very good.Click Here오피월드
ReplyDelete2YOUNGYANG
repeat post! I am actually getting ready to across this information, is very helpful my friend. Also great blog here with all of the valuable information you have. Keep up the good work you are doing here. 국산야동
ReplyDeleteAlso feel free to visit may web page 야설
It is imperative that we read blog post very carefully. I am already done it and find that this post is really amazing.
ReplyDeletedata science course
I have read your article, it is very informative and helpful for me.I admire the valuable information you offer in your articles. Thanks for posting it..
ReplyDeletecloud computing in hyderabad
You’re a very skilled blogger. I have joined your feed and look forward to seeking more of your fantastic post. 사설토토
ReplyDeleteI should say only that its awesome! The blog is informational and always produce amazing things 카지노
ReplyDeleteThe great destinations information is truly educated. The pleasant destinations alluded was acceptable information 파워볼
ReplyDeleteyour work was exemplary. The information provided was very helpful and articulate. Keep recording. 온라인카지노
ReplyDeleteThanks a lot for one’s intriguing write-up. It’s actually exceptional. Searching ahead for this sort of revisions.
ReplyDeletebest digital marketing training institute in hyderabad
ove to read it,Waiting For More new Update and I Already Read your Recent Post its Great Thank..great article. thank you for sharin..Great post. Thank You For Sharing Valuable. information. It is Very Informative article..I urge you to peruse this content it is fun portrayed .. 토토안전센터
ReplyDeleteGreat post, please keep on sharing amazing article like this! It makes me happy reading your post..Interesting and interesting information can be found on this topic here profile worth to see it.You are so intriguing! I don't assume I've genuinely perused anything like that previously. So extraordinary to discover someone for certain authentic considerations on this issue. Truly.. much obliged to you for firing this up. 카지노커뮤니티
ReplyDeleteThat is a great tip especially to those new to the blogosphere..Simple but very precise information… Thank you for sharing this one...A must read post!..hello!,I love your writing very a lot! percentage we communicate extra approximately your post on AOL? 안전놀이터
ReplyDeleteHello there, I have to say it was a really terrific experience for me when I dropped by at your website. I hope you don't mind if I praise you on the superior quality of your work and to send you all the best with it as you advance in the future. It was a pleasure to browse your web site and I shall definitely be calling back again shortly to discover how you are doing. Thanks a ton and I will no doubt see you here again soon 먹튀프렌즈
ReplyDeleteI really loved reading your blog. It was very well authored and easy to understand..You have a very nice blog. Thank you for sharing..A very awesome blog post. We are really grateful for your blog post. You will find a lot of approaches after visiting your pos 토토패밀리
ReplyDeleteI’m very pleased to find this site. I need to to thank you for ones time due to this wonderful read!! I definitely really liked every little bit of it and i also have you saved as a favorite to check out new stuff on your site. I’m very pleased to find this site. I need to to thank you for ones time due to this wonderful read!! I definitely really liked every little bit of it and i also have you saved as a favorite to check out new stuff on your site. 모두의토토
ReplyDeleteThis is such a great resource that you are providing and you give it away for free. I love seeing blog that understand the value of providing a quality resource for free . Wow! Such an amazing and helpful post this is. I really really love it. It's so good and so awesome. I am just amazed. I hope that you continue to do your work like this in the future also . Thanks for another wonderful post. Where else could anybody get that type of info in such an ideal way of writing? 토토서치
ReplyDeleteI read this article. I think You put a great deal of push to make this article. I value your work. I just idea it might be a plan to post incase any other person was having issues inquiring about yet I am somewhat uncertain in the event that I am permitted to put names and addresses on here. The site is affectionately adjusted and spared as much as date. So it ought to be, a debt of gratitude is in order for offering this to us. 카지노마트
ReplyDeleteI really appreciate the kind of topics you post here. Thanks for sharing us a great information that is actually helpful. Good day.Set aside my effort to peruse every one of the remarks, however, I truly delighted in the article. It ended up being Very useful to me and I am certain to all the analysts here. 카지노세상
ReplyDeleteYour site got my attention and shows me different perception for how we should boost our site. This is a really perfect for a new blogger like me who doesn't want their site to be messy with those spammers who don't even read your post but they have the guts to comment in your site. Thanks again. A big thanks for sharing this post by the way if anyone looking for Best Consulting Firm for Fake Experience Certificate Providers in hyderabad, India with Complete Documents So Dreamsoft Consultancy is the Best Place. 토토매거진
ReplyDeleteSuperbly written article, if only all bloggers offered the same content as you, the internet would be a far better place.. 토토사이트
ReplyDeleteSuperbly written article, if only all bloggers offered the same content as you, the internet would be a far better place..
ReplyDeleteI read a article under the same title some time ago, but this articles quality is much, much better. How you do this.. 토토사이트
ReplyDeleteI have bookmarked your site since this site contains significant data in it. You rock for keeping incredible stuff. I am a lot of appreciative of this site.
ReplyDeleteI truly like your style of blogging. I added it to my preferred's blog webpage list and will return soon…
ReplyDeleteCommercial Law Assignment Help is a large field of law that deals with the laws which govern trade and commerce. USA Students want to improve your marks in exam with A grades. Contact us our professional support team. Our writers are always help to 24*7 hours.
ReplyDeleteYou really make it look so natural with your exhibition however I see this issue as really something which I figure I could never understand. It appears to be excessively entangled and incredibly expansive for me.
ReplyDeletebusiness analytics training in hyderabad
exquisite post i would really like to thank you for the efforts you have made in scripting this interesting and knowledgeable article. Greetings to each one, it's surely a selected for me to visit this website page, it accommodates of helpful records. Just natural brilliance from you here. I've in no way anticipated some thing much less than this from you and you have not upset me in any respect. I think you may hold the nice work taking place. This is a notable article, given so much data in it, these sort of articles maintains the customers hobby inside the internet site, and maintain on sharing extra ... Correct luck. Thanks plenty for one’s fascinating write-up. It’s truly extremely good. Searching beforehand for this type of revisions. 먹튀검증커뮤니티
ReplyDeleteThanks for the exceptional proportion. Your article has proved your hard work and experience you've got were given on this discipline. First rate . I like it studying. I’m certain i'm able to at closing make a move the use of your guidelines on those things i ought to by no means had been capable of touch by myself. You had been so modern to let me be one of these to gain out of your useful statistics. Please understand how a lot i'm thankful. In reality respectable submit. I simply located your blog and wished to mention that i have certainly cherished surfing around your weblog entries. Regardless i'll be subscribing for your nourish and i accept as true with you compose afresh soon! I'm in reality taking part in reading your properly written articles. It seems like you spend a whole lot of time and effort in your weblog. I've bookmarked it and i'm searching forward to studying new articles. That is extraordinarily exciting substance! I've absolutely favored perusing your focuses and feature arrived at the realization which you are ideal approximately a widespread lot of them. You're excellent. I am thrilled and lucky to return on in your web page, i truly preferred the top notch article in your web page. Thanks for this useful data. I additionally determined very thrilling statistics 토토SOS
ReplyDeletehi, i assume that i saw you visited my web page so i got here to “return the prefer”. I am attempting to find matters to enhance my website! I think its good enough to use some of your ideas!! I’m genuinely happy i’ve located this records. In recent times bloggers put up only about gossips and net and this is honestly tense. An excellent weblog with interesting content, that is what i want. Thank you for retaining this web-website, i could be visiting it. Thank you loads for giving everyone remarkably marvellous chance to test guidelines from right here. It may be very first-rate and as nicely , full of a excellent time for me and my workplace friends to visit the blog 먹튀
ReplyDeleteI’m very pleased to find this site. I need to to thank you for ones time due to this wonderful read!!come my web site 토토사이트
ReplyDeleteI really loved reading your blog. It was very well authored and easy to understand..You have a very nice blog wow check my web site 먹튀검증
ReplyDeletePretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Anyway, I’ll be subscribing to your feed and I hope you post again soon.
ReplyDeletedata analytics course in hyderabad
I really loved reading your blog. It was very well authored and easy to understand..You have a very nice blog. good toto info for you 토토사이트
ReplyDeleteA single person cannot do the entire job of a data analyst; it requires multiple people to divide the task between them for getting a quick decision.
ReplyDeleteLots of virtual machines
ReplyDeleteExcellent way of describing, and nice post to take data concerning my presentation subject matter nice web info for you 토토사이트
ReplyDeleteI’m very pleased to find this site. I need to to thank you for ones time due to this wonderful read!! only nice web info for you 카지노사이트
ReplyDeleteI really loved reading your blog. It was very well authored and easy to understand..You have a very nice blog check my web site 안전놀이터
ReplyDeleteI am somewhat uncertain in the event that I am permitted to put names and addresses on here. check only nice web info for you 먹튀검증
ReplyDeleteData Science is the next big thing in the IT industry. Start your career in Data Science with 360DigiTMG’s Data Science training program. Enroll now!
ReplyDeletedata science institute in chennai
Now moved
ReplyDelete조건만남
Keep calm and reboot
ReplyDelete조건만남
bus rental Dubai its new of production thanks for sharing
ReplyDeleteOmasta kokemuksesta jos tee-se-itse remontoijalle ohjeita listaisi, olisi alku seuraavanlainen.
ReplyDelete한국야동
For the cloud, there are several resource consumers.
ReplyDelete한국야동