[slurm-users] Q about setting up CPU limits

Dj Merrill slurm at deej.net
Fri Sep 24 20:33:07 UTC 2021


Thank you Carsten.  I'll take a closer look at the QOS limit approach.

If I'm understanding the documentation correctly, partition limits (non 
QOS) are set via the slurm.conf file, and although there are options for 
limiting the max number of nodes for a person, and the max cpus per 
node, there isn't an option within slurm.conf to limit the max total 
number of cpus that someone can use, so my original approach will not work.

The QOS option you mention seems to be the way to do it in order to set 
a default limit for everyone on the partition.

The only other approach I can see would be to set an association limit 
for every account individually.

Thank you,

-Dj


On 9/23/21 07:18, Carsten Beyer wrote:
> Hi Dj,
>
> the solution could be in two QOS. We use something similar to restrict 
> usage of GPU nodes (MaxTresPU=node=2). Examples below are from our 
> Testcluster.
>
> 1) create a QOS with e.g. MaxTresPU=cpu=200 and assign it to your 
> partition, e.g.
>
> [root at bta0 ~]# sacctmgr -s show qos maxcpu format=Name,MaxTRESPU
>       Name     MaxTRESPU
> ---------- -------------
>     maxcpu        cpu=10
> [root at bta0 ~]#
> [root at bta0 ~]# scontrol show part maxtresputest
> PartitionName=maxtresputest
>    AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
>    AllocNodes=ALL Default=NO QoS=maxcpu
>
> If a user submits jobs requesting more cpus his (new) jobs get 
> 'QOSMaxCpuPerUserLimit' in squeue.
>
> kxxxxxx at btlogin1% squeue
>              JOBID PARTITION     NAME     USER ST       TIME NODES 
> NODELIST(REASON)
>             125316 maxtrespu maxsubmi  kxxxxxx PD 0:00      1 
> (QOSMaxCpuPerUserLimit)
>             125317 maxtrespu maxsubmi  kxxxxxx PD 0:00      1 
> (QOSMaxCpuPerUserLimit)
>             125305 maxtrespu maxsubmi  kxxxxxx  R 0:45      1 btc30
>             125306 maxtrespu maxsubmi  kxxxxxx  R 0:45      1 btc30
>
> 2) create a second QOS with Flags=DenyOnLimit,OverPartQoS and 
> MaxTresPU=400. Assign it to a user that should overcome the limit of 
> 200 cpus, but he will be limited then to 400. That user has to use 
> this QOS, when submiting new jobs, e.g.
>
> [root at bta0 ~]# sacctmgr -s show qos overpart 
> format=Name,Flags%30,MaxTRESPU
>       Name                          Flags     MaxTRESPU
> ---------- ------------------------------ -------------
>   overpart        DenyOnLimit,OverPartQOS        cpu=40
>
>
> Cheers,
> Carsten
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: deej.vcf
Type: text/vcard
Size: 4 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210924/f0dd5e70/attachment.vcf>


More information about the slurm-users mailing list