[slurm-users] [External] Is it possible to set a default QOS per partition?
Prentice Bisbal
pbisbal at pppl.gov
Mon Mar 1 22:26:13 UTC 2021
Two things:
1. So your users are okay with specifying a partition, but specifying a
QOS is a bridge too far?
2. Have your job_submit.lua script filter the jobs into the correct QOS.
You can check the partition and set the QOS accordingly.
First, you need to have this set in your slurm.conf:
JobSubmitPlugins=job_submit/lua
But I'm pretty sure that's the default setting.
Since it looks like your partitions and corresponding QOSes have the
same names, you can just add this line to the slurm_job_submit function
body in your job_submit.lua script:
job_desc.qos = job_desc.partition
And voila! Problem solved.
After editing job_submit.lua, you'll need to restart slurmctld for the
changes to take effect. Also, it's a good idea to 'tail -f'
slurmctld.log while restarting - any errors with the syntax will be
printed there, and if there's any errors in that file, slumctld won't
start.
--
Prentice
On 3/1/21 4:24 PM, Stack Korora wrote:
> Greetings,
>
> We have different node classes that we've set up in different
> partitions. For example, we have our standard compute nodes in
> compute; our GPU's in a gpu partition; and jobs that need to run for
> months go into a long partition with a different set of machines.
>
> For each partition, we have QOS to prevent any single user from
> dominating the resources (set at a max of 60% of resources; not my
> call - it's politics - I'm not going down that rabbit hole...).
>
> Thus, I've got something like this in my slurm.conf (abbreviating to
> save space; sorry if I trim too much).
>
> PartitionName=compute [snip] AllowQOS=compute Default=YES
> PartitionName=gpu [snip] AllowQOS=gpu Default=NO
> PartitionName=long [snip] AllowQOS=long Default=NO
>
> Then I have my QOS configured. And in my `sacctmgr dump cluster | grep
> DefaultQOS` I have "DefaultQOS=compute".
>
> All of that works exactly as expected.
>
> This makes it easy/nice for my users to just do something like:
> $ sbatch -n1 -N1 -p compute script.sh
>
> They don't have to specify the QOS for compute and they like this.
>
> However, for the other partitions they have to do something like this:
> $ sbatch -n1 -N1 -p long --qos=long script.sh
>
> The users don't like this. (though with scripts, I don't see the big
> deal in just adding a new line...but you know... users...)
>
> The request from the users is to make a default QOS for each partition
> thus not needing to specify the QOS for the other partitions.
>
> Because the default is set in the cluster configuration, I'm not sure
> how to do this. And I'm not seeing anything in the documentation for a
> scenario like this.
>
> Question A:
> Anyone know how I can set a default QOS per partition?
>
> Question B:
> Chesterton's fence and all... Is there a better way to accomplish what
> we are attempting to do? I don't want a single QOS to limit across all
> partitions. I need a per partition limit that restricts users to 60%
> of resources in that partition.
>
> Thank you!
> ~Stack~
>
--
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov
More information about the slurm-users
mailing list