Slurm version 23.02.07 If I have a QoS defined that has a set number of say GPU devices set in the GrpTRES. Is there an easy way to generate a list of how much of the defined quota is allocated or conversely un-allocated?
e.g.:
Name|Priority|GraceTime|Preempt|PreemptExemptTime|PreemptMode|Flags|UsageThres|UsageFactor|GrpTRES|GrpTRESMins|GrpTRESRunMins|GrpJobs|GrpSubmit|GrpWall|MaxTRES|MaxTRESPerNode|MaxTRESMins|MaxWall|MaxTRESPU|MaxJobsPU|MaxSubmitPU|MaxTRESPA|MaxJobsPA|MaxSubmitPA|MinTRES| normal|0|00:00:00|||cluster|||1.000000|||||||||||cpu=3000,gres/gpu=20||||||| dept1|1|00:00:00|||cluster|||1.000000|cpu=256,gres/gpu:1g.10gb=16,gres/gpu:2g.20gb=8,gres/gpu:3g.40gb=8,gres/gpu:a100.80gb=8||||||||||||||||| dept2|1|00:00:00|||cluster|||1.000000|cpu=256,gres/gpu:1g.10gb=0,gres/gpu:2g.20gb=0,gres/gpu:3g.40gb=0,gres/gpu:a100.80gb=16|||||||||||||||||
So dept1 and dept2 qos are set on the same partition. How can a user with access to one or other see if there are available resources in the partition?
Hi Alistair,
I was holding off replying in the hope someone would have a good answer. In lieu of that, here’s my partial answer:
When I looked at trying to report per-user and per-group qos values a few months I discovered that SLURM reports the information via this command:
scontrol -o show assoc_mgr flags=qos
I haven’t found any documentation explaining the format of that output. It seems to be parsable, but I’m not sure if the format will change in later version of SLURM. I’m using perl regexp’s for the reporting I’m doing, but here’s a grep-based example to extract per-group limits of cpus which works on my setup:
scontrol -o show assoc_mgr flags=qos|grep QOS=dept1|grep -o 'GrpTRES=[^ ]*'|grep -o 'cpu=[0-9]*'
That information is available to all SLURM users. But given the different contexts a qos can be used in, I’m not sure how you might be able to limit reporting only to users who are permitted to use a specific qos.
And for completeness, here’s a similar method for extracting per-user gpu limits:
scontrol -o show assoc_mgr flags=qos|grep QOS=myqosname|grep -o 'myusernamel([0-9]*)={[^}]*}'|grep -o 'MaxTRESPU=[^ ]*'|grep -o 'gres/gpu=[0-9]*([0-9]*)'
Regards, Mike
From: Alastair Neil via slurm-users slurm-users@lists.schedmd.com Sent: Tuesday, February 6, 2024 11:30 PM To: slurm-users@schedmd.com Subject: [External] [slurm-users] Is there a way to list allocated/unallocated resources defined in a QoS?
This email originated outside the University. Check before clicking links or attachments. Slurm version 23.02.07 If I have a QoS defined that has a set number of say GPU devices set in the GrpTRES. Is there an easy way to generate a list of how much of the defined quota is allocated or conversely un-allocated?
e.g.:
Name|Priority|GraceTime|Preempt|PreemptExemptTime|PreemptMode|Flags|UsageThres|UsageFactor|GrpTRES|GrpTRESMins|GrpTRESRunMins|GrpJobs|GrpSubmit|GrpWall|MaxTRES|MaxTRESPerNode|MaxTRESMins|MaxWall|MaxTRESPU|MaxJobsPU|MaxSubmitPU|MaxTRESPA|MaxJobsPA|MaxSubmitPA|MinTRES| normal|0|00:00:00|||cluster|||1.000000|||||||||||cpu=3000,gres/gpu=20||||||| dept1|1|00:00:00|||cluster|||1.000000|cpu=256,gres/gpu:1g.10gb=16,gres/gpu:2g.20gb=8,gres/gpu:3g.40gb=8,gres/gpu:a100.80gb=8||||||||||||||||| dept2|1|00:00:00|||cluster|||1.000000|cpu=256,gres/gpu:1g.10gb=0,gres/gpu:2g.20gb=0,gres/gpu:3g.40gb=0,gres/gpu:a100.80gb=16|||||||||||||||||
So dept1 and dept2 qos are set on the same partition. How can a user with access to one or other see if there are available resources in the partition?