[slurm-users] Cloud node utilization reporting
cseraphine at DRWHoldings.com
Tue Jun 6 15:33:45 UTC 2023
I’ve got a problem that I’d imagine others have as well and am wondering how it is handled.
I produce periodic reports for my management showing, among other things, the overall “cluster utilization”, which we define as basically the ratio of CPU*Minutes allocated to CPU*Minutes available. It’s a simplistic but handy metric for projecting growth, among other things.
Currently I grab this by running “sreport cluster utilization” and dividing “allocated” by “allocated + idle”, which gives us a pretty reasonable number. However, we recently added some cloud-based partitions. I was hoping that idle nodes with state=CLOUD would not show up in this sreport output, but unfortunately they do. Our cloud partitions are almost never used (they are essentially for emergencies), but because they are quite large it has dropped the computed utilization enormously. Management is really only interested in the utilization of our on-prem components.
I can kludge this by manually subtracting out the ( (number of CPUs in all cloud partitions) * (number of minutes in the reporting period) ), but that would require me to determine and add back in all allocated minutes for cloud jobs, keep track of intra-day changes to the partition sizes, etc.
Are others encountering similar problems? And if so, how do you resolve them?
For support please use help-grid in email or slack.
This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.
More information about the slurm-users