We are hitting this problem since a couple of releases and it’s increasing with the number of users/tenants in the CERN Cloud Infrastructure.
In nova there are two configuration options (“max_usage” and “until_refresh”) that define when the quota usage should be refreshed. In our case we have configured them with “-1” which means the quota usage must be refreshed every time “_is_quota_refresh_needed” method is called.
For more information about these options you can see a great blog post by Mike Dorman at http://t.co/Q5X1hTgJG1
This worked well in the releases before Havana. The quota gets out of sync and it’s refreshed next time a tenant user performs an operation (ex: create/delete/…).
However, in Havana with the introduction of “user quotas” (https://wiki.openstack.org/wiki/ReleaseNotes/Havana#Quota) this problem started to be more frequent even when forcing the quota to refresh every time.
At CERN Cloud Infrastructure a tenant usually has several users. When a user creates/deletes/… an instance and the quota gets out of sync it will affect all users in the tenant. The quota refresh only updates the resources of the user that is performing the operation and not all tenant resources. This means that in a tenant the quota usage will only be fixed if the user owner of the resource out of sync performs an operation.
The source of quota desync is very difficult to reproduce. In fact all our tries have failed to reproduce it consistently.
In order to fix the quota usage the operator needs to manually calculate the quota that is in use and update the database. This process is very cumbersome, time consuming and is can lead to the introduction of even more inconsistencies in the database.
In order to improve our operations we developed a small tool to check which quotas are out of sync and fix them if necessary.
The tool is available in CERN Operations github at: https://github.com/cernops/nova-quota-sync
How to use it?
usage: nova-quota-sync [-h] [--all] [--no_sync] [--auto_sync]
[--project_id PROJECT_ID] [--config CONFIG]
optional arguments:
-h, --help show this help message and exit
--all show the state of all quota resources
--no_sync don't perform any synchronization of the mismatch
resources
--auto_sync automatically sync all resources (no interactive)
--project_id PROJECT_ID
searches only project ID
--config CONFIG configuration file
The tool calculates the resources in use and compares them with the quota usages.
For example, to see all resources in quota usages that are out of sync:
# nova-quota-sync --no_sync
+-------------+----------+--------------+----------------+----------------------+----------+
| Project ID | User ID | Instances | Cores | Ram | Status |
+-------------+----------+--------------+----------------+----------------------+----------+
| 58ed2d48... | user_a | 657 -> 650 | 2628 -> 2600 | 5382144 -> 5324800 | Mismatch |
| 6f999252... | user_b | 9 -> 8 | 13 -> 11 | 25088 -> 20992 | Mismatch |
| 79d8d0a2... | user_c | 232 -> 231 | 5568 -> 5544 | 7424000 -> 7392000 | Mismatch |
| 827441b0... | user_d | 42 -> 41 | 56 -> 55 | 114688 -> 112640 | Mismatch |
| 8a5858da... | user_e | 2 -> 4 | 2 -> 4 | 1024 -> 2048 | Mismatch |
+-------------+----------+--------------+----------------+----------------------+----------+
The quota usage synchronization can be performed interactively per tenant/project (don’t specify the argument --no_sync) or automatically for all “mismatch” resources with the argument “--auto-sync”.
This tool needs access to nova database. The database endpoint should be defined in the configuration file (it can be nova.conf). Since it reads and updates the database be extremely careful when using it.
Note that quota reservations are not considered in the calculations or updated.
No comments:
Post a Comment