Hello,
TL,DR: How does the relative QOS flag work?
I have a QOS and I want it to be collectively restricted to 50% of the reachable cores in the cluster. I’ve been managing this by dividing my core count to 2 to get N, and doing ‘sacctmgr update qos foobar set MaxTRES=cpu=N’. That’s fine, except that N changes frequently as worker nodes come and go, so I have an hourly cronjob that enforces this.
The problem with *that* is the values you back out with ‘sacctmgr show qos’ are often in different units than you input (e.g. gigs instead of megs of ram), which means I need to do some annoying parsing, will miss minor rounding errors, etc. (Using JSON solves that problem but introduces others that I do not want to deal with.) So while looking for a better way I noticed that QOSs support a flag called “relative”, which says “the QOS limits will be treated as percentages of a cluster/partition instead of absolutes”. Perfect! However, it’s not very clear to me what that means in practice.
For example, with the “relative” flag set, the MaxTRES units still seem to work the same, expect metric suffices (“G”, ”M”, etc) and won’t accept periods. Is anyone actually using this flag, and if so can you tell me what attributes it affects, and in what way?
--
Chip Seraphine Grid Operations For support please use help-grid in email or slack. This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.