[slurm-users] Q about setting up CPU limits
Dj Merrill
slurm at deej.net
Fri Sep 24 20:33:07 UTC 2021
Thank you Carsten. I'll take a closer look at the QOS limit approach.
If I'm understanding the documentation correctly, partition limits (non
QOS) are set via the slurm.conf file, and although there are options for
limiting the max number of nodes for a person, and the max cpus per
node, there isn't an option within slurm.conf to limit the max total
number of cpus that someone can use, so my original approach will not work.
The QOS option you mention seems to be the way to do it in order to set
a default limit for everyone on the partition.
The only other approach I can see would be to set an association limit
for every account individually.
Thank you,
-Dj
On 9/23/21 07:18, Carsten Beyer wrote:
> Hi Dj,
>
> the solution could be in two QOS. We use something similar to restrict
> usage of GPU nodes (MaxTresPU=node=2). Examples below are from our
> Testcluster.
>
> 1) create a QOS with e.g. MaxTresPU=cpu=200 and assign it to your
> partition, e.g.
>
> [root at bta0 ~]# sacctmgr -s show qos maxcpu format=Name,MaxTRESPU
> Name MaxTRESPU
> ---------- -------------
> maxcpu cpu=10
> [root at bta0 ~]#
> [root at bta0 ~]# scontrol show part maxtresputest
> PartitionName=maxtresputest
> AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
> AllocNodes=ALL Default=NO QoS=maxcpu
>
> If a user submits jobs requesting more cpus his (new) jobs get
> 'QOSMaxCpuPerUserLimit' in squeue.
>
> kxxxxxx at btlogin1% squeue
> JOBID PARTITION NAME USER ST TIME NODES
> NODELIST(REASON)
> 125316 maxtrespu maxsubmi kxxxxxx PD 0:00 1
> (QOSMaxCpuPerUserLimit)
> 125317 maxtrespu maxsubmi kxxxxxx PD 0:00 1
> (QOSMaxCpuPerUserLimit)
> 125305 maxtrespu maxsubmi kxxxxxx R 0:45 1 btc30
> 125306 maxtrespu maxsubmi kxxxxxx R 0:45 1 btc30
>
> 2) create a second QOS with Flags=DenyOnLimit,OverPartQoS and
> MaxTresPU=400. Assign it to a user that should overcome the limit of
> 200 cpus, but he will be limited then to 400. That user has to use
> this QOS, when submiting new jobs, e.g.
>
> [root at bta0 ~]# sacctmgr -s show qos overpart
> format=Name,Flags%30,MaxTRESPU
> Name Flags MaxTRESPU
> ---------- ------------------------------ -------------
> overpart DenyOnLimit,OverPartQOS cpu=40
>
>
> Cheers,
> Carsten
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: deej.vcf
Type: text/vcard
Size: 4 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210924/f0dd5e70/attachment.vcf>
More information about the slurm-users
mailing list