LHC Tunnel

LHC Tunnel

Saturday 21 March 2015

Nova quota usage - synchronization

Nova quota usage gets frequently out of sync with the real usage consumption.
We are hitting this problem since a couple of releases and it’s increasing with the number of users/tenants in the CERN Cloud Infrastructure.

In nova there are two configuration options (“max_usage” and “until_refresh”) that define when the quota usage should be refreshed. In our case we have configured them with “-1” which means the quota usage must be refreshed every time “_is_quota_refresh_needed” method is called.
For more information about these options you can see a great blog post by Mike Dorman at http://t.co/Q5X1hTgJG1

This worked well in the releases before Havana. The quota gets out of sync and it’s refreshed next time a tenant user performs an operation (ex: create/delete/…).
However, in Havana with the introduction of “user quotas” (https://wiki.openstack.org/wiki/ReleaseNotes/Havana#Quota) this problem started to be more frequent even when forcing the quota to refresh every time.

At CERN Cloud Infrastructure a tenant usually has several users. When a user creates/deletes/… an instance and the quota gets out of sync it will affect all users in the tenant. The quota refresh only updates the resources of the user that is performing the operation and not all tenant resources. This means that in a tenant the quota usage will only be fixed if the user owner of the resource out of sync performs an operation.

The source of quota desync is very difficult to reproduce. In fact all our tries have failed to reproduce it consistently.
In order to fix the quota usage the operator needs to manually calculate the quota that is in use and update the database. This process is very cumbersome, time consuming and is can lead to the introduction of even more inconsistencies in the database.

In order to improve our operations we developed a small tool to check which quotas are out of sync and fix them if necessary.
The tool is available in CERN Operations github at: https://github.com/cernops/nova-quota-sync

How to use it?

usage: nova-quota-sync [-h] [--all] [--no_sync] [--auto_sync]
                       [--project_id PROJECT_ID] [--config CONFIG]

optional arguments:
  -h, --help            show this help message and exit
  --all                 show the state of all quota resources
  --no_sync             don't perform any synchronization of the mismatch
  --auto_sync           automatically sync all resources (no interactive)
  --project_id PROJECT_ID
                        searches only project ID

  --config CONFIG       configuration file

The tool calculates the resources in use and compares them with the quota usages.
For example, to see all resources in quota usages that are out of sync:

# nova-quota-sync --no_sync

| Project ID  | User ID  |  Instances   |     Cores      |         Ram          |  Status  |
| 58ed2d48... | user_a   |  657 -> 650  |  2628 -> 2600  |  5382144 -> 5324800  | Mismatch |
| 6f999252... | user_b   |    9 -> 8    |    13 -> 11    |    25088 -> 20992    | Mismatch |
| 79d8d0a2... | user_c   |  232 -> 231  |  5568 -> 5544  |  7424000 -> 7392000  | Mismatch |
| 827441b0... | user_d   |   42 -> 41   |    56 -> 55    |   114688 -> 112640   | Mismatch |
| 8a5858da... | user_e   |    2 -> 4    |     2 -> 4     |     1024 -> 2048     | Mismatch |

The quota usage synchronization can be performed interactively per tenant/project (don’t specify the argument --no_sync) or automatically for all “mismatch” resources with the argument “--auto-sync”.

This tool needs access to nova database. The database endpoint should be defined in the configuration file (it can be nova.conf). Since it reads and updates the database be extremely careful when using it.

Note that quota reservations are not considered in the calculations or updated.

No comments:

Post a Comment