Max TRES per user and node

List overview All Threads
Download

newer

older

Jobs pending with reason...

SLURM Telegraf Plugin

Guillaume COCHARD

24 Sep 2024 24 Sep '24

1:30 p.m.

Hello,

We are looking for a method to limit the TRES used by each user on a per-node basis. For example, we would like to limit the total memory allocation of jobs from a user to 200G per node.

There is MaxTRESperNode (https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESPerNode), but unfortunately, this is a per-job limit, not per user.

Ideally, we would like to apply this limit on partitions and/or QoS. Does anyone know if this is possible and how to achieve it?

Thank you,

Show replies by date

Groner, Rob

24 Sep 24 Sep

1:45 p.m.

You have the right idea.

On that same page, you'll find MaxTRESPerUser, as a QOS parameter.

You can create a QOS with the restrictions you'd like, and then in the partition definition, you give it that QOS. The QOS will then apply its restrictions to any jobs that use that partition.

Rob ________________________________ From: Guillaume COCHARD via slurm-users slurm-users@lists.schedmd.com Sent: Tuesday, September 24, 2024 9:30 AM To: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: [slurm-users] Max TRES per user and node

Hello,

We are looking for a method to limit the TRES used by each user on a per-node basis. For example, we would like to limit the total memory allocation of jobs from a user to 200G per node.

There is MaxTRESperNode (https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.sche...https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESPerNode), but unfortunately, this is a per-job limit, not per user.

Ideally, we would like to apply this limit on partitions and/or QoS. Does anyone know if this is possible and how to achieve it?

Thank you,

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

Guillaume COCHARD

2:09 p.m.

Thank you for your answer.

To test it I tried: sacctmgr update qos normal set maxtresperuser=cpu=2 # Then in slurm.conf PartitionName=test […] qos=normal

But then if I submit several 1-cpu jobs only two start and the others stay pending, even though I have several nodes available. So it seems that MaxTRESPerUser is a QoS-wide limit, and doesn't limit TRES per user and per node but rather per user and QoS (or rather partition since I applied the QoS on the partition). Did I miss something?

Thanks again, Guillaume

De: "Groner, Rob" rug262@psu.edu À: slurm-users@lists.schedmd.com, "Guillaume COCHARD" guillaume.cochard@cc.in2p3.fr Envoyé: Mardi 24 Septembre 2024 15:45:08 Objet: Re: Max TRES per user and node

You have the right idea.

On that same page, you'll find MaxTRESPerUser, as a QOS parameter.

You can create a QOS with the restrictions you'd like, and then in the partition definition, you give it that QOS. The QOS will then apply its restrictions to any jobs that use that partition.

Rob

From: Guillaume COCHARD via slurm-users slurm-users@lists.schedmd.com Sent: Tuesday, September 24, 2024 9:30 AM To: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: [slurm-users] Max TRES per user and node Hello,

We are looking for a method to limit the TRES used by each user on a per-node basis. For example, we would like to limit the total memory allocation of jobs from a user to 200G per node.

There is MaxTRESperNode ( [ https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESPerNode | https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.sche... ] ), but unfortunately, this is a per-job limit, not per user.

Ideally, we would like to apply this limit on partitions and/or QoS. Does anyone know if this is possible and how to achieve it?

Thank you,

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

Groner, Rob

2:37 p.m.

Ah, sorry, I didn't catch that from your first post (though you did say it).

So, you are trying to limit the user to no more than 200G of memory on a single node? So if they submit a 2nd job, that job can start but will have to go onto another node, and will again be restricted to 200G? So they can start as many jobs as there are nodes, and each job will be restricted to using 1 node and 200G of memory? Or can they submit a job asking for 4 nodes, where they are limited to 200G on each node? Or are they limited to a single node, no matter how many jobs?

Rob

________________________________ From: Guillaume COCHARD guillaume.cochard@cc.in2p3.fr Sent: Tuesday, September 24, 2024 10:09 AM To: Groner, Rob rug262@psu.edu Cc: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: Re: Max TRES per user and node

Thank you for your answer.

To test it I tried: sacctmgr update qos normal set maxtresperuser=cpu=2 # Then in slurm.conf PartitionName=test […] qos=normal

Thanks again, Guillaume

________________________________ De: "Groner, Rob" rug262@psu.edu À: slurm-users@lists.schedmd.com, "Guillaume COCHARD" guillaume.cochard@cc.in2p3.fr Envoyé: Mardi 24 Septembre 2024 15:45:08 Objet: Re: Max TRES per user and node

You have the right idea.

On that same page, you'll find MaxTRESPerUser, as a QOS parameter.

You can create a QOS with the restrictions you'd like, and then in the partition definition, you give it that QOS. The QOS will then apply its restrictions to any jobs that use that partition.

Hello,

We are looking for a method to limit the TRES used by each user on a per-node basis. For example, we would like to limit the total memory allocation of jobs from a user to 200G per node.

Ideally, we would like to apply this limit on partitions and/or QoS. Does anyone know if this is possible and how to achieve it?

Thank you,

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

Guillaume COCHARD

2:58 p.m.

...

"So if they submit a 2 nd job, that job can start but will have to go onto another node, and will again be restricted to 200G? So they can start as many jobs as there are nodes, and each job will be restricted to using 1 node and 200G of memory?"

Yes that's it. We already have MaxNodes=1 so a job can't be spread on multiple nodes.

To be more precise, the limit should be by user and not by job. To illustrate, let's imagine we have 3 empty nodes and a 200G/user/node limit. If a user submit 10 jobs each requesting 100G of memory, there should be 2 jobs running on each worker and 4 jobs pending.

Guillaume

De: "Groner, Rob" rug262@psu.edu À: "Guillaume COCHARD" guillaume.cochard@cc.in2p3.fr Cc: slurm-users@lists.schedmd.com Envoyé: Mardi 24 Septembre 2024 16:37:34 Objet: Re: Max TRES per user and node

Ah, sorry, I didn't catch that from your first post (though you did say it).

So, you are trying to limit the user to no more than 200G of memory on a single node? So if they submit a 2 nd job, that job can start but will have to go onto another node, and will again be restricted to 200G? So they can start as many jobs as there are nodes, and each job will be restricted to using 1 node and 200G of memory? Or can they submit a job asking for 4 nodes, where they are limited to 200G on each node? Or are they limited to a single node, no matter how many jobs?

Rob

From: Guillaume COCHARD guillaume.cochard@cc.in2p3.fr Sent: Tuesday, September 24, 2024 10:09 AM To: Groner, Rob rug262@psu.edu Cc: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: Re: Max TRES per user and node Thank you for your answer.

To test it I tried: sacctmgr update qos normal set maxtresperuser=cpu=2 # Then in slurm.conf PartitionName=test […] qos=normal

Thanks again, Guillaume

De: "Groner, Rob" rug262@psu.edu À: slurm-users@lists.schedmd.com, "Guillaume COCHARD" guillaume.cochard@cc.in2p3.fr Envoyé: Mardi 24 Septembre 2024 15:45:08 Objet: Re: Max TRES per user and node

You have the right idea.

On that same page, you'll find MaxTRESPerUser, as a QOS parameter.

You can create a QOS with the restrictions you'd like, and then in the partition definition, you give it that QOS. The QOS will then apply its restrictions to any jobs that use that partition.

Rob

We are looking for a method to limit the TRES used by each user on a per-node basis. For example, we would like to limit the total memory allocation of jobs from a user to 200G per node.

Ideally, we would like to apply this limit on partitions and/or QoS. Does anyone know if this is possible and how to achieve it?

Thank you,

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

Groner, Rob

3:34 p.m.

Ok, that example helped. Max of 200G on a single node, per user (not job). No limits on how many jobs and nodes they can use...just a limit of 200G per node per user.

And in that case, it's out of my realm of experience. 🙂 I'm relatively confident there IS a way...but I don't know it offhand. i was thinking maybe some combination of qos and partition and account limits....

Rob

________________________________ From: Guillaume COCHARD guillaume.cochard@cc.in2p3.fr Sent: Tuesday, September 24, 2024 10:58 AM To: Groner, Rob rug262@psu.edu Cc: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: Re: Max TRES per user and node

...

"So if they submit a 2nd job, that job can start but will have to go onto another node, and will again be restricted to 200G? So they can start as many jobs as there are nodes, and each job will be restricted to using 1 node and 200G of memory?"

Yes that's it. We already have MaxNodes=1 so a job can't be spread on multiple nodes.

Guillaume

________________________________ De: "Groner, Rob" rug262@psu.edu À: "Guillaume COCHARD" guillaume.cochard@cc.in2p3.fr Cc: slurm-users@lists.schedmd.com Envoyé: Mardi 24 Septembre 2024 16:37:34 Objet: Re: Max TRES per user and node

Ah, sorry, I didn't catch that from your first post (though you did say it).

Rob

Thank you for your answer.

To test it I tried: sacctmgr update qos normal set maxtresperuser=cpu=2 # Then in slurm.conf PartitionName=test […] qos=normal

Thanks again, Guillaume

You have the right idea.

On that same page, you'll find MaxTRESPerUser, as a QOS parameter.

You can create a QOS with the restrictions you'd like, and then in the partition definition, you give it that QOS. The QOS will then apply its restrictions to any jobs that use that partition.

Hello,

We are looking for a method to limit the TRES used by each user on a per-node basis. For example, we would like to limit the total memory allocation of jobs from a user to 200G per node.

Ideally, we would like to apply this limit on partitions and/or QoS. Does anyone know if this is possible and how to achieve it?

Thank you,

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

Carsten Beyer

25 Sep 25 Sep

11:27 a.m.

Hi Guillaume,

as Rob it already mentioned, this could maybe a way for you (partition just created temporarily online for testing). You could also add your MaxTRES=node=1 for more restrictions. We do something similar with QOS to restrict the number of CPU's for user in certain partitions.

sacctmgr create qos name=maxtrespu200G maxtrespu=mem=200G flags=denyonlimit

scontrol create partition=testtres qos=maxtrespu200g maxtime=08:00:00 nodes=lt[10000-10003] DefMemPerCPU=940 MaxMemPerCPU=940 OverSubscribe=NO

That results in:

4 jobs with 100G each:

--- [root@levantetest ~]# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 862 testtres hostname xxxxxxx PD 0:00 1 (QOSMaxMemoryPerUser) 861 testtres hostname xxxxxxx PD 0:00 1 (QOSMaxMemoryPerUser) 860 testtres hostname xxxxxxx R 0:15 1 lt10000 859 testtres hostname xxxxxxx R 0:22 1 lt10000

6 jobs with 50G each:

--- [k202068@levantetest ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 876 testtres hostname xxxxxxx PD 0:00 1 (QOSMaxMemoryPerUser) 875 testtres hostname xxxxxxx PD 0:00 1 (QOSMaxMemoryPerUser) 874 testtres hostname xxxxxxx R 9:09 1 lt10000 873 testtres hostname xxxxxxx R 9:15 1 lt10000 872 testtres hostname xxxxxxx R 9:22 1 lt10000 871 testtres hostname xxxxxxx R 9:26 1 lt10000

Best Regrads, Carsten

-- Carsten Beyer Abteilung Systeme Deutsches Klimarechenzentrum GmbH (DKRZ) Bundesstraße 45a * D-20146 Hamburg * Germany Phone: +49 40 460094-221 Fax: +49 40 460094-270 Email:beyer@dkrz.de URL:http://www.dkrz.de Geschäftsführer: Prof. Dr. Thomas Ludwig Sitz der Gesellschaft: Hamburg Amtsgericht Hamburg HRB 39784 Am 24.09.24 um 16:58 schrieb Guillaume COCHARD via slurm-users: > > "So if they submit a 2^nd job, that job can start but will have to > go onto another node, and will again be restricted to 200G? So they > can start as many jobs as there are nodes, and each job will be > restricted to using 1 node and 200G of memory?" > > Yes that's it. We already have MaxNodes=1 so a job can't be spread on > multiple nodes. > > To be more precise, the limit should be by user and not by job. To > illustrate, let's imagine we have 3 empty nodes and a 200G/user/node > limit. If a user submit 10 jobs each requesting 100G of memory, there > should be 2 jobs running on each worker and 4 jobs pending. > > Guillaume > > ------------------------------------------------------------------------ > *De: *"Groner, Rob" rug262@psu.edu > *À: *"Guillaume COCHARD" guillaume.cochard@cc.in2p3.fr > *Cc: *slurm-users@lists.schedmd.com > *Envoyé: *Mardi 24 Septembre 2024 16:37:34 > *Objet: *Re: Max TRES per user and node > > Ah, sorry, I didn't catch that from your first post (though you did > say it). > > So, you are trying to limit the user to no more than 200G of memory on > a single node? So if they submit a 2^nd job, that job can start but > will have to go onto another node, and will again be restricted to > 200G? So they can start as many jobs as there are nodes, and each job > will be restricted to using 1 node and 200G of memory? Or can they > submit a job asking for 4 nodes, where they are limited to 200G on > each node? Or are they limited to a single node, no matter how many jobs? > > Rob > > ------------------------------------------------------------------------ > *From:* Guillaume COCHARD guillaume.cochard@cc.in2p3.fr > *Sent:* Tuesday, September 24, 2024 10:09 AM > *To:* Groner, Rob rug262@psu.edu > *Cc:* slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com > *Subject:* Re: Max TRES per user and node > Thank you for your answer. > > To test it I tried: > sacctmgr update qos normal set maxtresperuser=cpu=2 > # Then in slurm.conf > PartitionName=test […] qos=normal > > But then if I submit several 1-cpu jobs only two start and the others > stay pending, even though I have several nodes available. So it seems > that MaxTRESPerUser is a QoS-wide limit, and doesn't limit TRES per > user and per node but rather per user and QoS (or rather partition > since I applied the QoS on the partition). Did I miss something? > > Thanks again, > Guillaume > > ------------------------------------------------------------------------ > *De: *"Groner, Rob" rug262@psu.edu > *À: *slurm-users@lists.schedmd.com, "Guillaume COCHARD" > guillaume.cochard@cc.in2p3.fr > *Envoyé: *Mardi 24 Septembre 2024 15:45:08 > *Objet: *Re: Max TRES per user and node > > You have the right idea. > > On that same page, you'll find MaxTRESPerUser, as a QOS parameter. > > You can create a QOS with the restrictions you'd like, and then in the > partition definition, you give it that QOS. The QOS will then apply > its restrictions to any jobs that use that partition. > > Rob > ------------------------------------------------------------------------ > *From:* Guillaume COCHARD via slurm-users slurm-users@lists.schedmd.com > *Sent:* Tuesday, September 24, 2024 9:30 AM > *To:* slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com > *Subject:* [slurm-users] Max TRES per user and node > Hello, > > We are looking for a method to limit the TRES used by each user on a > per-node basis. For example, we would like to limit the total memory > allocation of jobs from a user to 200G per node. > > There is MaxTRESperNode > (https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.sche... > https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESPerNode), but > unfortunately, this is a per-job limit, not per user. > > Ideally, we would like to apply this limit on partitions and/or QoS. > Does anyone know if this is possible and how to achieve it? > > Thank you, > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-leave@lists.schedmd.com > > >

Groner, Rob

1:06 p.m.

The trick, I think (and Guillaume can certainly correct me) is that the aim is to allow the user to run as many (up to) 200G mem jobs as they want....so long as they do not consume more than 200G on any single node. So, they could run 10 200G jobs....on 10 different nodes. So the mem limit isn't per user...it's per user per node. I think the qos limit you created below works as an OVERALL limit for the user, but doesn't allow a per-node limiting.

Rob

________________________________ From: Carsten Beyer via slurm-users slurm-users@lists.schedmd.com Sent: Wednesday, September 25, 2024 7:27 AM To: Guillaume COCHARD guillaume.cochard@cc.in2p3.fr Cc: Slurm User Community List slurm-users@lists.schedmd.com Subject: [slurm-users] Re: Max TRES per user and node

Hi Guillaume,

sacctmgr create qos name=maxtrespu200G maxtrespu=mem=200G flags=denyonlimit

scontrol create partition=testtres qos=maxtrespu200g maxtime=08:00:00 nodes=lt[10000-10003] DefMemPerCPU=940 MaxMemPerCPU=940 OverSubscribe=NO

That results in:

4 jobs with 100G each:

6 jobs with 50G each:

Best Regrads, Carsten

-- Carsten Beyer Abteilung Systeme

Deutsches Klimarechenzentrum GmbH (DKRZ) Bundesstraße 45a * D-20146 Hamburg * Germany

Phone: +49 40 460094-221 Fax: +49 40 460094-270 Email: beyer@dkrz.demailto:beyer@dkrz.de URL: http://www.dkrz.de http://www.dkrz.de/

Geschäftsführer: Prof. Dr. Thomas Ludwig Sitz der Gesellschaft: Hamburg Amtsgericht Hamburg HRB 39784

Am 24.09.24 um 16:58 schrieb Guillaume COCHARD via slurm-users:

...

"So if they submit a 2nd job, that job can start but will have to go onto another node, and will again be restricted to 200G? So they can start as many jobs as there are nodes, and each job will be restricted to using 1 node and 200G of memory?"

Yes that's it. We already have MaxNodes=1 so a job can't be spread on multiple nodes.

Guillaume

________________________________ De: "Groner, Rob" rug262@psu.edu mailto:rug262@psu.edu À: "Guillaume COCHARD" guillaume.cochard@cc.in2p3.fr mailto:guillaume.cochard@cc.in2p3.fr Cc: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Envoyé: Mardi 24 Septembre 2024 16:37:34 Objet: Re: Max TRES per user and node

Ah, sorry, I didn't catch that from your first post (though you did say it).

Rob

________________________________ From: Guillaume COCHARD guillaume.cochard@cc.in2p3.fr mailto:guillaume.cochard@cc.in2p3.fr Sent: Tuesday, September 24, 2024 10:09 AM To: Groner, Rob rug262@psu.edu mailto:rug262@psu.edu Cc: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com Subject: Re: Max TRES per user and node

Thank you for your answer.

To test it I tried: sacctmgr update qos normal set maxtresperuser=cpu=2 # Then in slurm.conf PartitionName=test […] qos=normal

Thanks again, Guillaume

________________________________ De: "Groner, Rob" rug262@psu.edu mailto:rug262@psu.edu À: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com, "Guillaume COCHARD" guillaume.cochard@cc.in2p3.fr mailto:guillaume.cochard@cc.in2p3.fr Envoyé: Mardi 24 Septembre 2024 15:45:08 Objet: Re: Max TRES per user and node

You have the right idea.

On that same page, you'll find MaxTRESPerUser, as a QOS parameter.

You can create a QOS with the restrictions you'd like, and then in the partition definition, you give it that QOS. The QOS will then apply its restrictions to any jobs that use that partition.

Rob ________________________________ From: Guillaume COCHARD via slurm-users slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com Sent: Tuesday, September 24, 2024 9:30 AM To: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com Subject: [slurm-users] Max TRES per user and node

Hello,

We are looking for a method to limit the TRES used by each user on a per-node basis. For example, we would like to limit the total memory allocation of jobs from a user to 200G per node.

Ideally, we would like to apply this limit on partitions and/or QoS. Does anyone know if this is possible and how to achieve it?

Thank you,

-- slurm-users mailing list -- slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.commailto:slurm-users-leave@lists.schedmd.com

Paul Raines

1:29 p.m.

I am pretty sure there is no way to do exactly a per user per node limit in SLURM. I cannot think of a good reason why one would do this. Can you explain?

I don't see why it matters if you have two user submitting two 200G jobs if the jobs for the users are spread out over two nodes rather than jobs for one users both running on one node and jobs for the other user running on the other node.

If what you are really trying to limit is the amount of resources SLURM as a whole is using on a node so SLURM never uses more than 200G out of the 400GB on a node (for example) there are definitely ways to do that using MemSpecLimit on the node. You can even apportion CPU cores using CpuSpecLimit and varios cgroups v2 settings at the OS level

Otherwise there maybe a way with some fancy scripting in LUA submit plugin or playing around with Feature/Helper plugin

-- Paul Raines (http://help.nmr.mgh.harvard.edu)

On Wed, 25 Sep 2024 9:06am, Groner, Rob via slurm-users wrote:

...

   External Email - Use Caution
The trick, I think (and Guillaume can certainly correct me) is that the aim is to allow the user to run as many (up to) 200G mem jobs as they want....so long as they do not consume more than 200G on any single node. So, they could run 10 200G jobs....on 10 different nodes. So the mem limit isn't per user...it's per user per node. I think the qos limit you created below works as an OVERALL limit for the user, but doesn't allow a per-node limiting.

Rob

From: Carsten Beyer via slurm-users slurm-users@lists.schedmd.com Sent: Wednesday, September 25, 2024 7:27 AM To: Guillaume COCHARD guillaume.cochard@cc.in2p3.fr Cc: Slurm User Community List slurm-users@lists.schedmd.com Subject: [slurm-users] Re: Max TRES per user and node

Hi Guillaume,

as Rob it already mentioned, this could maybe a way for you (partition just created temporarily online for testing). You could also add your MaxTRES=node=1 for more restrictions. We do something similar with QOS to restrict the number of CPU's for user in certain partitions.

sacctmgr create qos name=maxtrespu200G maxtrespu=mem=200G flags=denyonlimit

scontrol create partition=testtres qos=maxtrespu200g maxtime=08:00:00 nodes=lt[10000-10003] DefMemPerCPU=940 MaxMemPerCPU=940 OverSubscribe=NO

That results in:

4 jobs with 100G each:

[root@levantetest ~]# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 862 testtres hostname xxxxxxx PD 0:00 1 (QOSMaxMemoryPerUser) 861 testtres hostname xxxxxxx PD 0:00 1 (QOSMaxMemoryPerUser) 860 testtres hostname xxxxxxx R 0:15 1 lt10000 859 testtres hostname xxxxxxx R 0:22 1 lt10000

6 jobs with 50G each:

[k202068@levantetest ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 876 testtres hostname xxxxxxx PD 0:00 1 (QOSMaxMemoryPerUser) 875 testtres hostname xxxxxxx PD 0:00 1 (QOSMaxMemoryPerUser) 874 testtres hostname xxxxxxx R 9:09 1 lt10000 873 testtres hostname xxxxxxx R 9:15 1 lt10000 872 testtres hostname xxxxxxx R 9:22 1 lt10000 871 testtres hostname xxxxxxx R 9:26 1 lt10000

Best Regrads, Carsten

-- Carsten Beyer Abteilung Systeme

Deutsches Klimarechenzentrum GmbH (DKRZ) Bundesstraße 45a * D-20146 Hamburg * Germany

Phone: +49 40 460094-221 Fax: +49 40 460094-270 Email: beyer@dkrz.demailto:beyer@dkrz.de URL: http://secure-web.cisco.com/18mNIF1Mm62EsB2cTnSaa6d75Sa4G5gm93wEUMB3EcCPiuTd...http://secure-web.cisco.com/1tD8NJ0mYOKbL_OR1BQQ2c2cx_XKZv483M-I-wTGyBNtOGXjw9D_hg_SVsgxa_d354_tcr1PEnDtQVydsV5FTeRHMd4PGpsVdujpaK7Jxxd0fKlnG7UHbKuC7YNusrPTphKaUagFzfF9CNo5WYMHOY2GiJMGbhNaCs2qKMrSxpgxSthUZDwCq16zeU03xps1Ar3le5oNqse4SbMDbfEd8anRGRFFhqXTVqHukHok4YQdNAXzQvUG4oI_J5hd11IrO6QMK-jdaUqH3BT1SCR9J4wzbDgUQPsYzZLmhvUjNs7R5Ok5uDUTWbyIUCFgbvDj-/http%3A%2F%2Fwww.dkrz.de%2F

Geschäftsführer: Prof. Dr. Thomas Ludwig Sitz der Gesellschaft: Hamburg Amtsgericht Hamburg HRB 39784

Am 24.09.24 um 16:58 schrieb Guillaume COCHARD via slurm-users:

...
"So if they submit a 2nd job, that job can start but will have to go onto another node, and will again be restricted to 200G? So they can start as many jobs as there are nodes, and each job will be restricted to using 1 node and 200G of memory?"

Yes that's it. We already have MaxNodes=1 so a job can't be spread on multiple nodes.

To be more precise, the limit should be by user and not by job. To illustrate, let's imagine we have 3 empty nodes and a 200G/user/node limit. If a user submit 10 jobs each requesting 100G of memory, there should be 2 jobs running on each worker and 4 jobs pending.

Guillaume

De: "Groner, Rob" rug262@psu.edu mailto:rug262@psu.edu À: "Guillaume COCHARD" guillaume.cochard@cc.in2p3.fr mailto:guillaume.cochard@cc.in2p3.fr Cc: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Envoyé: Mardi 24 Septembre 2024 16:37:34 Objet: Re: Max TRES per user and node

Ah, sorry, I didn't catch that from your first post (though you did say it).

So, you are trying to limit the user to no more than 200G of memory on a single node? So if they submit a 2nd job, that job can start but will have to go onto another node, and will again be restricted to 200G? So they can start as many jobs as there are nodes, and each job will be restricted to using 1 node and 200G of memory? Or can they submit a job asking for 4 nodes, where they are limited to 200G on each node? Or are they limited to a single node, no matter how many jobs?

Rob

From: Guillaume COCHARD guillaume.cochard@cc.in2p3.fr mailto:guillaume.cochard@cc.in2p3.fr Sent: Tuesday, September 24, 2024 10:09 AM To: Groner, Rob rug262@psu.edu mailto:rug262@psu.edu Cc: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com Subject: Re: Max TRES per user and node

Thank you for your answer.

To test it I tried: sacctmgr update qos normal set maxtresperuser=cpu=2 # Then in slurm.conf PartitionName=test […] qos=normal

But then if I submit several 1-cpu jobs only two start and the others stay pending, even though I have several nodes available. So it seems that MaxTRESPerUser is a QoS-wide limit, and doesn't limit TRES per user and per node but rather per user and QoS (or rather partition since I applied the QoS on the partition). Did I miss something?

Thanks again, Guillaume

De: "Groner, Rob" rug262@psu.edu mailto:rug262@psu.edu À: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com, "Guillaume COCHARD" guillaume.cochard@cc.in2p3.fr mailto:guillaume.cochard@cc.in2p3.fr Envoyé: Mardi 24 Septembre 2024 15:45:08 Objet: Re: Max TRES per user and node

You have the right idea.

On that same page, you'll find MaxTRESPerUser, as a QOS parameter.

You can create a QOS with the restrictions you'd like, and then in the partition definition, you give it that QOS. The QOS will then apply its restrictions to any jobs that use that partition.

Rob ________________________________ From: Guillaume COCHARD via slurm-users slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com Sent: Tuesday, September 24, 2024 9:30 AM To: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com Subject: [slurm-users] Max TRES per user and node

Hello,

We are looking for a method to limit the TRES used by each user on a per-node basis. For example, we would like to limit the total memory allocation of jobs from a user to 200G per node.

There is MaxTRESperNode (https://secure-web.cisco.com/1XYDj7Zdd4kXviWIuEf6iCrFcMFlCsAS-1kUfn299tt7EA4...<https://secure-web.cisco.com/1eTmS5ppViSqsfEbAWDze7up-C2_7OXXxGCsHsg6phQ_lcy...

N9TDHMAdFtyyhI11W9TNAzcnTY44EGnapBeP62aT1TQ6k1ARr94C3XzOWFYS5k79kEQr9wfEfMnPoaJS3VinlgoQyceawdB4DUUULBnDWPhgIujAzEbz2MGniDwJfFF_dUB56mEQyY8wuho5io4cUkwvwkzDL_leyDNgdlYQXPmTvWG4y-OV2YAQQAZ-ygEC2WcjCSc4H9xywLoMSsbZ14rmMotx/https%3A%2F%2Fslurm.schedmd.com%2Fsacctmgr.html%23OPT_MaxTRESPerNode>), but unfortunately, this is a per-job limit, not per user.

...

Ideally, we would like to apply this limit on partitions and/or QoS. Does anyone know if this is possible and how to achieve it?

Thank you,

-- slurm-users mailing list -- slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.commailto:slurm-users-leave@lists.schedmd.com

The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline https://www.massgeneralbrigham.org/complianceline . Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.

Guillaume COCHARD

1:47 p.m.

Hello,

Thank you all for your answers.

Carsten, as sid by Rob we need a limit per node, not only per user.

Paul, we know we are asking for something quite unorthodox. The thing is, we overbook the memory on our cluster (i.e., if a worker has 200G of memory, Slurm can allocate up to 280G on it). In our use case (HTC, with lots of small, inefficient jobs), this approach has worked well to improve our cluster usage (up to 40% more jobs without adding any hardware!). However, if the cluster is somewhat empty and a user submits lots of big, efficient jobs, we can of course experience some OOM kills.

So far, the tradeoff has been largely in our favor, so we are okay with this, but it would be nice to avoid this situation altogether. Having a maximum TRES per user and per node would ensure a good mix of jobs from different users, so if some jobs were highly efficient, others would be inefficient enough to counterbalance that.

Once again, I know this is sub-optimal and that we should probably educate our users so they stop wasting resources, but in the meantime, this approach works quite well, so we are looking to improve it until we no longer need it.

Guillaume

----- Mail original ----- De: "Paul Raines" raines@nmr.mgh.harvard.edu À: "Guillaume COCHARD" guillaume.cochard@cc.in2p3.fr Cc: "Slurm User Community List" slurm-users@lists.schedmd.com Envoyé: Mercredi 25 Septembre 2024 15:29:28 Objet: Re: [slurm-users] Re: Max TRES per user and node

I am pretty sure there is no way to do exactly a per user per node limit in SLURM. I cannot think of a good reason why one would do this. Can you explain?

Otherwise there maybe a way with some fancy scripting in LUA submit plugin or playing around with Feature/Helper plugin

-- Paul Raines (http://help.nmr.mgh.harvard.edu)

On Wed, 25 Sep 2024 9:06am, Groner, Rob via slurm-users wrote:

...

   External Email - Use Caution
The trick, I think (and Guillaume can certainly correct me) is that the aim is to allow the user to run as many (up to) 200G mem jobs as they want....so long as they do not consume more than 200G on any single node. So, they could run 10 200G jobs....on 10 different nodes. So the mem limit isn't per user...it's per user per node. I think the qos limit you created below works as an OVERALL limit for the user, but doesn't allow a per-node limiting.

Rob

From: Carsten Beyer via slurm-users slurm-users@lists.schedmd.com Sent: Wednesday, September 25, 2024 7:27 AM To: Guillaume COCHARD guillaume.cochard@cc.in2p3.fr Cc: Slurm User Community List slurm-users@lists.schedmd.com Subject: [slurm-users] Re: Max TRES per user and node

Hi Guillaume,

as Rob it already mentioned, this could maybe a way for you (partition just created temporarily online for testing). You could also add your MaxTRES=node=1 for more restrictions. We do something similar with QOS to restrict the number of CPU's for user in certain partitions.

sacctmgr create qos name=maxtrespu200G maxtrespu=mem=200G flags=denyonlimit

scontrol create partition=testtres qos=maxtrespu200g maxtime=08:00:00 nodes=lt[10000-10003] DefMemPerCPU=940 MaxMemPerCPU=940 OverSubscribe=NO

That results in:

4 jobs with 100G each:

[root@levantetest ~]# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 862 testtres hostname xxxxxxx PD 0:00 1 (QOSMaxMemoryPerUser) 861 testtres hostname xxxxxxx PD 0:00 1 (QOSMaxMemoryPerUser) 860 testtres hostname xxxxxxx R 0:15 1 lt10000 859 testtres hostname xxxxxxx R 0:22 1 lt10000

6 jobs with 50G each:

[k202068@levantetest ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 876 testtres hostname xxxxxxx PD 0:00 1 (QOSMaxMemoryPerUser) 875 testtres hostname xxxxxxx PD 0:00 1 (QOSMaxMemoryPerUser) 874 testtres hostname xxxxxxx R 9:09 1 lt10000 873 testtres hostname xxxxxxx R 9:15 1 lt10000 872 testtres hostname xxxxxxx R 9:22 1 lt10000 871 testtres hostname xxxxxxx R 9:26 1 lt10000

Best Regrads, Carsten

-- Carsten Beyer Abteilung Systeme

Deutsches Klimarechenzentrum GmbH (DKRZ) Bundesstraße 45a * D-20146 Hamburg * Germany

Phone: +49 40 460094-221 Fax: +49 40 460094-270 Email: beyer@dkrz.demailto:beyer@dkrz.de URL: http://secure-web.cisco.com/18mNIF1Mm62EsB2cTnSaa6d75Sa4G5gm93wEUMB3EcCPiuTd...http://secure-web.cisco.com/1tD8NJ0mYOKbL_OR1BQQ2c2cx_XKZv483M-I-wTGyBNtOGXjw9D_hg_SVsgxa_d354_tcr1PEnDtQVydsV5FTeRHMd4PGpsVdujpaK7Jxxd0fKlnG7UHbKuC7YNusrPTphKaUagFzfF9CNo5WYMHOY2GiJMGbhNaCs2qKMrSxpgxSthUZDwCq16zeU03xps1Ar3le5oNqse4SbMDbfEd8anRGRFFhqXTVqHukHok4YQdNAXzQvUG4oI_J5hd11IrO6QMK-jdaUqH3BT1SCR9J4wzbDgUQPsYzZLmhvUjNs7R5Ok5uDUTWbyIUCFgbvDj-/http%3A%2F%2Fwww.dkrz.de%2F

Geschäftsführer: Prof. Dr. Thomas Ludwig Sitz der Gesellschaft: Hamburg Amtsgericht Hamburg HRB 39784

Am 24.09.24 um 16:58 schrieb Guillaume COCHARD via slurm-users:

...
"So if they submit a 2nd job, that job can start but will have to go onto another node, and will again be restricted to 200G? So they can start as many jobs as there are nodes, and each job will be restricted to using 1 node and 200G of memory?"

Yes that's it. We already have MaxNodes=1 so a job can't be spread on multiple nodes.

To be more precise, the limit should be by user and not by job. To illustrate, let's imagine we have 3 empty nodes and a 200G/user/node limit. If a user submit 10 jobs each requesting 100G of memory, there should be 2 jobs running on each worker and 4 jobs pending.

Guillaume

De: "Groner, Rob" rug262@psu.edu mailto:rug262@psu.edu À: "Guillaume COCHARD" guillaume.cochard@cc.in2p3.fr mailto:guillaume.cochard@cc.in2p3.fr Cc: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Envoyé: Mardi 24 Septembre 2024 16:37:34 Objet: Re: Max TRES per user and node

Ah, sorry, I didn't catch that from your first post (though you did say it).

So, you are trying to limit the user to no more than 200G of memory on a single node? So if they submit a 2nd job, that job can start but will have to go onto another node, and will again be restricted to 200G? So they can start as many jobs as there are nodes, and each job will be restricted to using 1 node and 200G of memory? Or can they submit a job asking for 4 nodes, where they are limited to 200G on each node? Or are they limited to a single node, no matter how many jobs?

Rob

From: Guillaume COCHARD guillaume.cochard@cc.in2p3.fr mailto:guillaume.cochard@cc.in2p3.fr Sent: Tuesday, September 24, 2024 10:09 AM To: Groner, Rob rug262@psu.edu mailto:rug262@psu.edu Cc: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com Subject: Re: Max TRES per user and node

Thank you for your answer.

To test it I tried: sacctmgr update qos normal set maxtresperuser=cpu=2 # Then in slurm.conf PartitionName=test […] qos=normal

But then if I submit several 1-cpu jobs only two start and the others stay pending, even though I have several nodes available. So it seems that MaxTRESPerUser is a QoS-wide limit, and doesn't limit TRES per user and per node but rather per user and QoS (or rather partition since I applied the QoS on the partition). Did I miss something?

Thanks again, Guillaume

De: "Groner, Rob" rug262@psu.edu mailto:rug262@psu.edu À: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com, "Guillaume COCHARD" guillaume.cochard@cc.in2p3.fr mailto:guillaume.cochard@cc.in2p3.fr Envoyé: Mardi 24 Septembre 2024 15:45:08 Objet: Re: Max TRES per user and node

You have the right idea.

On that same page, you'll find MaxTRESPerUser, as a QOS parameter.

You can create a QOS with the restrictions you'd like, and then in the partition definition, you give it that QOS. The QOS will then apply its restrictions to any jobs that use that partition.

Rob ________________________________ From: Guillaume COCHARD via slurm-users slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com Sent: Tuesday, September 24, 2024 9:30 AM To: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com Subject: [slurm-users] Max TRES per user and node

Hello,

We are looking for a method to limit the TRES used by each user on a per-node basis. For example, we would like to limit the total memory allocation of jobs from a user to 200G per node.

There is MaxTRESperNode (https://secure-web.cisco.com/1XYDj7Zdd4kXviWIuEf6iCrFcMFlCsAS-1kUfn299tt7EA4...<https://secure-web.cisco.com/1eTmS5ppViSqsfEbAWDze7up-C2_7OXXxGCsHsg6phQ_lcy...

...

Ideally, we would like to apply this limit on partitions and/or QoS. Does anyone know if this is possible and how to achieve it?

Thank you,

-- slurm-users mailing list -- slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.commailto:slurm-users-leave@lists.schedmd.com

310

Age (days ago)

311

Last active (days ago)

slurm-users@lists.schedmd.com

9 comments

4 participants

tags (0)

participants (4)

Carsten Beyer
Groner, Rob
Guillaume COCHARD
Paul Raines