Hello,
I have added a new "qos" with these parameters: sacctmgr add qos test-GPUs MaxJobsPerUser=6 MaxTRESPerUser=gres/gpu=1 MaxSubmitJobsPerUser=25. With it, I only allow 6 running jobs per user, a total of 25 pending+running job per user and only 1 GPU. I have applied this qos directly to a partition in slurm.conf.
When a user submits to that partition requesting 2 or more GPUs, job remains "PD" (pending) and notifies "QOSMaxGRESPerUser" in NODELIST column, but I would like to know if it would be possible to direcly reject job and avoid that job remains at queue? For example, if I submit 50 jobs, after number 25 I get message "sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits) sbatch: error: QOSMaxSubmitJobPerUserLimit" 25 times)
Thanks.
Hi,
Check the QOS flag DenyOnLimit: https://slurm.schedmd.com/qos.html#qos_other. Setting it on your QOS will cause Slurm to reject the job at submission if it exceeds Max or Grp limits. I think setting it will achieve the behavior you’re after.
Best,
Sebastian Smith
Seattle Children’s Hospital
DevOps Engineer, Principal
Email: sebastian.smith@seattlechildrens.orgmailto:sebastian.smith@seattlechildrens.org?subject=[SIG]
Web: https://seattlechildrens.orghttps://seattlechildrens.org/
--
From: Gestió Servidors via slurm-users slurm-users@lists.schedmd.com Date: Wednesday, October 22, 2025 at 23:26 To: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: [slurm-users] Job remains "PENDING" with reason "QOSMaxGRESPerUser"
This Message Is From an External Sender Report Suspicioushttps://us-phishalarm-ewt.proofpoint.com/EWT/v1/NuzbfyPwt6ZyPHQ!jhzmJHzJSjFC3LI3AVFGhN1hDlo8x8d9Ruv7Sgvo4i6O2OP_zrvUQKFYtmBq7s59ocs_peopF5owulREid2yHLbVsFxle587j8mTND-f2-VkGdd28_f3G8ex69duSEXhCLei63FfxIlbG8DrbKG_E1Y$
Hello,
I have added a new “qos” with these parameters: sacctmgr add qos test-GPUs MaxJobsPerUser=6 MaxTRESPerUser=gres/gpu=1 MaxSubmitJobsPerUser=25. With it, I only allow 6 running jobs per user, a total of 25 pending+running job per user and only 1 GPU. I have applied this qos directly to a partition in slurm.conf.
When a user submits to that partition requesting 2 or more GPUs, job remains “PD” (pending) and notifies “QOSMaxGRESPerUser” in NODELIST column, but I would like to know if it would be possible to direcly reject job and avoid that job remains at queue? For example, if I submit 50 jobs, after number 25 I get message “sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits) sbatch: error: QOSMaxSubmitJobPerUserLimit” 25 times)
Thanks.
CONFIDENTIALITY NOTICE: This e-mail, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information protected by law. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.