Hello,
I have three nodes, serving each one 2 GPUs. I would like to limit (qos??) that a user could user only one GPU from earch server, but user could user simultaneously three GPUs if each GPU belongs to different servers. With this QoS "sacctmgr add qos test-limit-GPUs MaxJobsPerUser=3 MaxTRESPerUser=gres/gpu=1" I can limit to one GPU, but then user can't run other job in a GPU from other server. How must I configure QoS (or other method) to allow more than one job requesting GPUs but never in the same server?
Thanks.
You may want to look at MaxTRESPerNode and possibly MaxTRESPerJob. Doing it PerUser means all running jobs for that user, which may not be what you want.
Brian Andrus
On 10/22/2025 11:44 PM, Gestió Servidors via slurm-users wrote:
Hello,
I have three nodes, serving each one 2 GPUs. I would like to limit (qos??) that a user could user only one GPU from earch server, but user could user simultaneously three GPUs if each GPU belongs to different servers. With this QoS “/sacctmgr add qos test-limit-GPUs MaxJobsPerUser=3 MaxTRESPerUser=gres/gpu=1”/ I can limit to one GPU, but then user can’t run other job in a GPU from other server. How must I configure QoS (or other method) to allow more than one job requesting GPUs but never in the same server?
Thanks.
This is an interesting question, and I was thinking the same as Brian.
For sake of discussion, I’m not sure MaxTRESPerNodewill achieve the desired job distribution because the limits are applied per job not across all a user's jobs. But… I’ve never used this limit, and I may be interpreting the docs incorrectly. Definitely worth testing.
Combining with SelectTypeParameters=CR_LLN would help distribute the workloads across least loaded nodes but also wouldn’t guarantee one job per user per node.
This could be achieved by the user structuring their jobs and selecting the right combo of sbatch/srun options.
I’m not sure if there’s a baked-in set of options that will achieve all requirements. It might require a custom select plugin?!
I don’t know if any of this moves the needle... Good question! Excited to learn more, and if a solution exists. Don’t forget to share what you find.
Thanks,
Sebastian Smith
Seattle Children’s Hospital
DevOps Engineer, Principal
Email: sebastian.smith@seattlechildrens.orgmailto:sebastian.smith@seattlechildrens.org?subject=[SIG]
Web: https://seattlechildrens.orghttps://seattlechildrens.org/
--
From: Brian Andrus via slurm-users slurm-users@lists.schedmd.com Date: Thursday, October 23, 2025 at 10:03 To: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: [slurm-users] Re: Limit number of allocated GPUs
This Message Is From an External Sender Report Suspicioushttps://us-phishalarm-ewt.proofpoint.com/EWT/v1/NuzbfyPwt6ZyPHQ!ghziwncPB1tjMj_c4duHp_fnF01uSPoHrhZ9Ewe_QFYKKEO6AxKMsiKVCEoVyAKVclINH2h5n7thZ0uMLUJDv3BjMBm4TCJnHD6hXbyAopEg$
You may want to look at MaxTRESPerNode and possibly MaxTRESPerJob. Doing it PerUser means all running jobs for that user, which may not be what you want.
Brian Andrus
On 10/22/2025 11:44 PM, Gestió Servidors via slurm-users wrote: Hello,
I have three nodes, serving each one 2 GPUs. I would like to limit (qos??) that a user could user only one GPU from earch server, but user could user simultaneously three GPUs if each GPU belongs to different servers. With this QoS “sacctmgr add qos test-limit-GPUs MaxJobsPerUser=3 MaxTRESPerUser=gres/gpu=1” I can limit to one GPU, but then user can’t run other job in a GPU from other server. How must I configure QoS (or other method) to allow more than one job requesting GPUs but never in the same server?
Thanks.
CONFIDENTIALITY NOTICE: This e-mail, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information protected by law. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.