[slurm-users] [External] Is it possible to set a default QOS per partition?
pbisbal at pppl.gov
Mon Mar 1 22:26:13 UTC 2021
1. So your users are okay with specifying a partition, but specifying a
QOS is a bridge too far?
2. Have your job_submit.lua script filter the jobs into the correct QOS.
You can check the partition and set the QOS accordingly.
First, you need to have this set in your slurm.conf:
But I'm pretty sure that's the default setting.
Since it looks like your partitions and corresponding QOSes have the
same names, you can just add this line to the slurm_job_submit function
body in your job_submit.lua script:
job_desc.qos = job_desc.partition
And voila! Problem solved.
After editing job_submit.lua, you'll need to restart slurmctld for the
changes to take effect. Also, it's a good idea to 'tail -f'
slurmctld.log while restarting - any errors with the syntax will be
printed there, and if there's any errors in that file, slumctld won't
On 3/1/21 4:24 PM, Stack Korora wrote:
> We have different node classes that we've set up in different
> partitions. For example, we have our standard compute nodes in
> compute; our GPU's in a gpu partition; and jobs that need to run for
> months go into a long partition with a different set of machines.
> For each partition, we have QOS to prevent any single user from
> dominating the resources (set at a max of 60% of resources; not my
> call - it's politics - I'm not going down that rabbit hole...).
> Thus, I've got something like this in my slurm.conf (abbreviating to
> save space; sorry if I trim too much).
> PartitionName=compute [snip] AllowQOS=compute Default=YES
> PartitionName=gpu [snip] AllowQOS=gpu Default=NO
> PartitionName=long [snip] AllowQOS=long Default=NO
> Then I have my QOS configured. And in my `sacctmgr dump cluster | grep
> DefaultQOS` I have "DefaultQOS=compute".
> All of that works exactly as expected.
> This makes it easy/nice for my users to just do something like:
> $ sbatch -n1 -N1 -p compute script.sh
> They don't have to specify the QOS for compute and they like this.
> However, for the other partitions they have to do something like this:
> $ sbatch -n1 -N1 -p long --qos=long script.sh
> The users don't like this. (though with scripts, I don't see the big
> deal in just adding a new line...but you know... users...)
> The request from the users is to make a default QOS for each partition
> thus not needing to specify the QOS for the other partitions.
> Because the default is set in the cluster configuration, I'm not sure
> how to do this. And I'm not seeing anything in the documentation for a
> scenario like this.
> Question A:
> Anyone know how I can set a default QOS per partition?
> Question B:
> Chesterton's fence and all... Is there a better way to accomplish what
> we are attempting to do? I don't want a single QOS to limit across all
> partitions. I need a per partition limit that restricts users to 60%
> of resources in that partition.
> Thank you!
Lead Software Engineer
Princeton Plasma Physics Laboratory
More information about the slurm-users