[slurm-users] Q about setting up CPU limits

Thu Sep 23 11:18:46 UTC 2021

Hi Dj,

the solution could be in two QOS. We use something similar to restrict 
usage of GPU nodes (MaxTresPU=node=2). Examples below are from our 
Testcluster.

1) create a QOS with e.g. MaxTresPU=cpu=200 and assign it to your 
partition, e.g.

[root at bta0 ~]# sacctmgr -s show qos maxcpu format=Name,MaxTRESPU
       Name     MaxTRESPU
---------- -------------
     maxcpu        cpu=10
[root at bta0 ~]#
[root at bta0 ~]# scontrol show part maxtresputest
PartitionName=maxtresputest
    AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
    AllocNodes=ALL Default=NO QoS=maxcpu

If a user submits jobs requesting more cpus his (new) jobs get 
'QOSMaxCpuPerUserLimit' in squeue.

kxxxxxx at btlogin1% squeue
              JOBID PARTITION     NAME     USER ST       TIME NODES 
NODELIST(REASON)
             125316 maxtrespu maxsubmi  kxxxxxx PD 0:00      1 
(QOSMaxCpuPerUserLimit)
             125317 maxtrespu maxsubmi  kxxxxxx PD 0:00      1 
(QOSMaxCpuPerUserLimit)
             125305 maxtrespu maxsubmi  kxxxxxx  R 0:45      1 btc30
             125306 maxtrespu maxsubmi  kxxxxxx  R 0:45      1 btc30

2) create a second QOS with Flags=DenyOnLimit,OverPartQoS and 
MaxTresPU=400. Assign it to a user that should overcome the limit of 200 
cpus, but he will be limited then to 400. That user has to use this QOS, 
when submiting new jobs, e.g.

[root at bta0 ~]# sacctmgr -s show qos overpart format=Name,Flags%30,MaxTRESPU
       Name                          Flags     MaxTRESPU
---------- ------------------------------ -------------
   overpart        DenyOnLimit,OverPartQOS        cpu=40

Cheers,
Carsten

-- 
Carsten Beyer
Abteilung Systeme

Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45a * D-20146 Hamburg * Germany

Phone:  +49 40 460094-221
Fax:    +49 40 460094-270
Email:  beyer at dkrz.de
URL:    http://www.dkrz.de

Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784

Am 22.09.2021 um 20:57 schrieb Dj Merrill:
> Hi all,
>
> I'm relatively new to Slurm and my Internet searches so far have 
> turned up lots of examples from the client perspective, but not from 
> the admin perspective on how to set this up, and I'm hoping someone 
> can point us in the right direction.  This should be pretty simple...  
> :-)
>
> We have a test cluster running Slurm 21.08.1 and are trying to figure 
> out how to set a limit of 200 CPU cores that can be requested in a 
> partition.  Basically, if someone submits a thousand single CPU core 
> jobs, it should run 200 of them and the other 800 will wait in the 
> queue until 1 is finished, then run their next job from the queue, 
> etc, or if someone has a 180 CPU core job running and they submit a 30 
> CPU core job, it should wait in the queue until the 180 core job 
> finishes.  If someone submits a job requesting 201 CPU cores, it 
> should fail and give an error.
>
> According to the Slurm resource limits hierarchy, if a partition limit 
> is set, we should be able to setup a user association to override it 
> in the case where we might want someone to be able to access 300 CPU 
> cores in that partition, for example.
>
> I can see in the Slurm documentation how to setup max nodes per 
> partition, but have not been able to find how to do this with CPU cores.
>
> My questions are:
>
> 1) How do we setup a CPU core limit on a partition that applies to all 
> users?
>
> 2) How do we setup a user association to allow a single person to use 
> more than the default CPU core limit set on the partition?
>
> 3) Is there a better way to accomplish this than the method I'm asking?
>
>
> For reference, Slurm accounting is setup, GPU allocations are working 
> properly, and I think we are close but just missing something obvious 
> to setup the CPU core limits.
>
>
> Thank you,
>
>
> -Dj
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5316 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210923/c8f45729/attachment.bin>