[slurm-users] Re: Max TRES per user and node

25 Sep 2024

      Hi Guillaume,
as Rob it already mentioned, this could maybe a way for you (partition 
just created temporarily online for testing). You could also add your 
MaxTRES=node=1 for more restrictions. We do something similar with QOS 
to restrict the number of CPU's for user in certain partitions.
sacctmgr create qos name=maxtrespu200G maxtrespu=mem=200G flags=denyonlimit
scontrol create partition=testtres qos=maxtrespu200g maxtime=08:00:00 
nodes=lt[10000-10003] DefMemPerCPU=940 MaxMemPerCPU=940 OverSubscribe=NO
That results in:
4 jobs with 100G each:
---
[root@levantetest ~]# squeue
              JOBID PARTITION     NAME     USER ST       TIME NODES 
NODELIST(REASON)
                862  testtres hostname  xxxxxxx PD       0:00 1 
(QOSMaxMemoryPerUser)
                861  testtres hostname  xxxxxxx PD       0:00 1 
(QOSMaxMemoryPerUser)
                860  testtres hostname  xxxxxxx  R       0:15 1 lt10000
                859  testtres hostname  xxxxxxx  R       0:22 1 lt10000
6 jobs with 50G each:
---
[k202068@levantetest ~]$ squeue
              JOBID PARTITION     NAME     USER ST       TIME NODES 
NODELIST(REASON)
                876  testtres hostname  xxxxxxx PD       0:00 1 
(QOSMaxMemoryPerUser)
                875  testtres hostname  xxxxxxx PD       0:00 1 
(QOSMaxMemoryPerUser)
                874  testtres hostname  xxxxxxx  R       9:09 1 lt10000
                873  testtres hostname  xxxxxxx  R       9:15 1 lt10000
                872  testtres hostname  xxxxxxx  R       9:22 1 lt10000
                871  testtres hostname  xxxxxxx  R       9:26 1 lt10000
Best Regrads,
Carsten
-- 
Carsten Beyer
Abteilung Systeme

Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45a * D-20146 Hamburg * Germany

Phone:  +49 40 460094-221
Fax:    +49 40 460094-270
Email:beyer@dkrz.de
URL:http://www.dkrz.de

Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784

Am 24.09.24 um 16:58 schrieb Guillaume COCHARD via slurm-users:
> > "So if they submit a 2^nd job, that job can start but will have to 
> go onto another node, and will again be restricted to 200G?  So they 
> can start as many jobs as there are nodes, and each job will be 
> restricted to using 1 node and 200G of memory?"
>
> Yes that's it. We already have MaxNodes=1 so a job can't be spread on 
> multiple nodes.
>
> To be more precise, the limit should be by user and not by job. To 
> illustrate, let's imagine we have 3 empty nodes and a 200G/user/node 
> limit. If a user submit 10 jobs each requesting 100G of memory, there 
> should be 2 jobs running on each worker and 4 jobs pending.
>
> Guillaume
>
> ------------------------------------------------------------------------
> *De: *"Groner, Rob" rug262@psu.edu
> *À: *"Guillaume COCHARD" guillaume.cochard@cc.in2p3.fr
> *Cc: *slurm-users@lists.schedmd.com
> *Envoyé: *Mardi 24 Septembre 2024 16:37:34
> *Objet: *Re: Max TRES per user and node
>
> Ah, sorry, I didn't catch that from your first post (though you did 
> say it).
>
> So, you are trying to limit the user to no more than 200G of memory on 
> a single node?  So if they submit a 2^nd  job, that job can start but 
> will have to go onto another node, and will again be restricted to 
> 200G?  So they can start as many jobs as there are nodes, and each job 
> will be restricted to using 1 node and 200G of memory? Or can they 
> submit a job asking for 4 nodes, where they are limited to 200G on 
> each node?  Or are they limited to a single node, no matter how many jobs?
>
> Rob
>
> ------------------------------------------------------------------------
> *From:* Guillaume COCHARD guillaume.cochard@cc.in2p3.fr
> *Sent:* Tuesday, September 24, 2024 10:09 AM
> *To:* Groner, Rob rug262@psu.edu
> *Cc:* slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com
> *Subject:* Re: Max TRES per user and node
> Thank you for your answer.
>
> To test it I tried:
> sacctmgr update qos normal set maxtresperuser=cpu=2
> # Then in slurm.conf
> PartitionName=test […] qos=normal
>
> But then if I submit several 1-cpu jobs only two start and the others 
> stay pending, even though I have several nodes available. So it seems 
> that MaxTRESPerUser is a QoS-wide limit, and doesn't limit TRES per 
> user and per node but rather per user and QoS (or rather partition 
> since I applied the QoS on the partition). Did I miss something?
>
> Thanks again,
> Guillaume
>
> ------------------------------------------------------------------------
> *De: *"Groner, Rob" rug262@psu.edu
> *À: *slurm-users@lists.schedmd.com, "Guillaume COCHARD" 
> guillaume.cochard@cc.in2p3.fr
> *Envoyé: *Mardi 24 Septembre 2024 15:45:08
> *Objet: *Re: Max TRES per user and node
>
> You have the right idea.
>
> On that same page, you'll find MaxTRESPerUser, as a QOS parameter.
>
> You can create a QOS with the restrictions you'd like, and then in the 
> partition definition, you give it that QOS.  The QOS will then apply 
> its restrictions to any jobs that use that partition.
>
> Rob
> ------------------------------------------------------------------------
> *From:* Guillaume COCHARD via slurm-users slurm-users@lists.schedmd.com
> *Sent:* Tuesday, September 24, 2024 9:30 AM
> *To:* slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com
> *Subject:* [slurm-users] Max TRES per user and node
> Hello,
>
> We are looking for a method to limit the TRES used by each user on a 
> per-node basis. For example, we would like to limit the total memory 
> allocation of jobs from a user to 200G per node.
>
> There is MaxTRESperNode 
> (https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.sche... 
> https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESPerNode), but 
> unfortunately, this is a per-job limit, not per user.
>
> Ideally, we would like to apply this limit on partitions and/or QoS. 
> Does anyone know if this is possible and how to achieve it?
>
> Thank you,
>
> -- 
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
>
>
>

2025

2024

[slurm-users] Re: Max TRES per user and node