[slurm-users] [EXT] rejecting jobs that exceed QOS limits
Sean Crosby
scrosby at unimelb.edu.au
Sat May 29 04:08:38 UTC 2021
Hi Paul,
Try
sacctmgr modify qos gputest set flags=DenyOnLimit
Sean
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Paul Raines <raines at nmr.mgh.harvard.edu>
Sent: Saturday, 29 May 2021 12:48
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: [EXT] [slurm-users] rejecting jobs that exceed QOS limits
External email: Please exercise caution
I want to dedicate one of our GPU servers for testing where
users are only allowed to run 1 job at a time using 1 GPU and
8 cores of the server. So I put one server in a partition on its
own and setup a QOS for it as follows:
sacctmgr add qos gputest
sacctmgr modify qos gputest set priority=20
sacctmgr modify qos gputest set MaxJobsPerUser=1
sacctmgr modify qos gputest set MaxTRESPerUser=cpu=8,gres/gpu=1
sacctmgr show qos format=name,priority,MaxTRESPerUser,MaxJobsPerUser
In slurm.conf I have:
AccountingStorageEnforce=safe,qos
AccountingStorageTRES=Billing,CPU,Energy,Mem,Node,FS/Disk,Pages,VMem,gres/gpu
EnforcePartLimits=ALL
This works but when I submit a job asking for 2 more more GPUs, instead
of being immediate rejected it queues but never runs. Same if I
ask for more than 8 cores
Is there a way to get it immediately rejected?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210529/7f2077e4/attachment.htm>
More information about the slurm-users
mailing list