as Rob it already mentioned, this could maybe a way for you (partition
just created temporarily online for testing). You could also add your
MaxTRES=node=1 for more restrictions. We do something similar with QOS
to restrict the number of CPU's for user in certain partitions.
scontrol create partition=testtres qos=maxtrespu200g maxtime=08:00:00
nodes=lt[10000-10003] DefMemPerCPU=940 MaxMemPerCPU=940 OverSubscribe=NO
---
[root@levantetest ~]# squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
862 testtres hostname xxxxxxx PD 0:00 1
(QOSMaxMemoryPerUser)
861 testtres hostname xxxxxxx PD 0:00 1
(QOSMaxMemoryPerUser)
860 testtres hostname xxxxxxx R 0:15 1 lt10000
859 testtres hostname xxxxxxx R 0:22 1 lt10000
---
[k202068@levantetest ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
876 testtres hostname xxxxxxx PD 0:00 1
(QOSMaxMemoryPerUser)
875 testtres hostname xxxxxxx PD 0:00 1
(QOSMaxMemoryPerUser)
874 testtres hostname xxxxxxx R 9:09 1 lt10000
873 testtres hostname xxxxxxx R 9:15 1 lt10000
872 testtres hostname xxxxxxx R 9:22 1 lt10000
871 testtres hostname xxxxxxx R 9:26 1 lt10000
--
Carsten Beyer
Abteilung Systeme
Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45a * D-20146 Hamburg * Germany
Phone: +49 40 460094-221
Fax: +49 40 460094-270
Email:beyer@dkrz.de
URL:
http://www.dkrz.de
Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784
Am 24.09.24 um 16:58 schrieb Guillaume COCHARD via slurm-users:
> > "So if they submit a 2^nd job, that job can start but will have to
> go onto another node, and will again be restricted to 200G? So they
> can start as many jobs as there are nodes, and each job will be
> restricted to using 1 node and 200G of memory?"
>
> Yes that's it. We already have MaxNodes=1 so a job can't be spread on
> multiple nodes.
>
> To be more precise, the limit should be by user and not by job. To
> illustrate, let's imagine we have 3 empty nodes and a 200G/user/node
> limit. If a user submit 10 jobs each requesting 100G of memory, there
> should be 2 jobs running on each worker and 4 jobs pending.
>
> Guillaume
>
> ------------------------------------------------------------------------
> *De: *"Groner, Rob"
rug262@psu.edu
> *À: *"Guillaume COCHARD"
guillaume.cochard@cc.in2p3.fr
> *Cc: *slurm-users@lists.schedmd.com
> *Envoyé: *Mardi 24 Septembre 2024 16:37:34
> *Objet: *Re: Max TRES per user and node
>
> Ah, sorry, I didn't catch that from your first post (though you did
> say it).
>
> So, you are trying to limit the user to no more than 200G of memory on
> a single node? So if they submit a 2^nd job, that job can start but
> will have to go onto another node, and will again be restricted to
> 200G? So they can start as many jobs as there are nodes, and each job
> will be restricted to using 1 node and 200G of memory? Or can they
> submit a job asking for 4 nodes, where they are limited to 200G on
> each node? Or are they limited to a single node, no matter how many jobs?
>
> Rob
>
> ------------------------------------------------------------------------
> *From:* Guillaume COCHARD
guillaume.cochard@cc.in2p3.fr
> *Sent:* Tuesday, September 24, 2024 10:09 AM
> *To:* Groner, Rob
rug262@psu.edu
> *Cc:* slurm-users@lists.schedmd.com
slurm-users@lists.schedmd.com
> *Subject:* Re: Max TRES per user and node
> Thank you for your answer.
>
> To test it I tried:
> sacctmgr update qos normal set maxtresperuser=cpu=2
> # Then in slurm.conf
> PartitionName=test […] qos=normal
>
> But then if I submit several 1-cpu jobs only two start and the others
> stay pending, even though I have several nodes available. So it seems
> that MaxTRESPerUser is a QoS-wide limit, and doesn't limit TRES per
> user and per node but rather per user and QoS (or rather partition
> since I applied the QoS on the partition). Did I miss something?
>
> Thanks again,
> Guillaume
>
> ------------------------------------------------------------------------
> *De: *"Groner, Rob"
rug262@psu.edu
> *À: *slurm-users@lists.schedmd.com, "Guillaume COCHARD"
>
guillaume.cochard@cc.in2p3.fr
> *Envoyé: *Mardi 24 Septembre 2024 15:45:08
> *Objet: *Re: Max TRES per user and node
>
> You have the right idea.
>
> On that same page, you'll find MaxTRESPerUser, as a QOS parameter.
>
> You can create a QOS with the restrictions you'd like, and then in the
> partition definition, you give it that QOS. The QOS will then apply
> its restrictions to any jobs that use that partition.
>
> Rob
> ------------------------------------------------------------------------
> *From:* Guillaume COCHARD via slurm-users
slurm-users@lists.schedmd.com
> *Sent:* Tuesday, September 24, 2024 9:30 AM
> *To:* slurm-users@lists.schedmd.com
slurm-users@lists.schedmd.com
> *Subject:* [slurm-users] Max TRES per user and node
> Hello,
>
> We are looking for a method to limit the TRES used by each user on a
> per-node basis. For example, we would like to limit the total memory
> allocation of jobs from a user to 200G per node.
>
> There is MaxTRESperNode
> (
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.sche...
>
https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESPerNode), but
> unfortunately, this is a per-job limit, not per user.
>
> Ideally, we would like to apply this limit on partitions and/or QoS.
> Does anyone know if this is possible and how to achieve it?
>
> Thank you,
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
>
>
>