[slurm-users] DenyOnLimit flag ignored for QOS, always rejects?
Renfro, Michael
Renfro at tntech.edu
Fri Jan 25 16:35:15 UTC 2019
Hey, folks. Running 17.02.10 with Bright Cluster Manager 8.0.
I wanted to limit queue-stuffing on my GPU nodes, similar to what AssocGrpCPURunMinutesLimit does. The current goal is to restrict a user to having 8 active or queued jobs in the production GPU partition, and block (not reject) other jobs to allow other users fair access to the queue. I'm good with a time limit instead of a job number limit, too.
I'd assumed a partition QOS was the way to go, as the sacctmgr man page reads in part:
Flags Used by the slurmctld to override or enforce certain characteristics.
Valid options are
DenyOnLimit
If set, jobs using this QOS will be rejected at submission time if they do not conform to the QOS 'Max' limits. Group limits will also be treated like 'Max' limits as well and will be denied if they go over. By default jobs that go over these limits will pend until they conform. This currently only applies to QOS and Association limits.
So avoid setting the DenyOnLimit flag, and extra jobs will pend until they conform, right? My QOS settings for 8 active or pending GPU jobs per user are as follows:
$ sacctmgr list qos normal,gpu format=name,priority,gracetime,preemptmode,usagefactor,grptresrunmin,MaxSubmitJobsPerUser,flags
Name Priority GraceTime PreemptMode UsageFactor GrpTRESRunMin MaxSubmitPU Flags
---------- ---------- ---------- ----------- ----------- ------------- ----------- --------------------
normal 0 00:00:00 cluster 1.000000
gpu 0 00:00:00 cluster 1.000000 8
Partition settings, where the gpu QOS is applied to jobs in the gpu partition:
$ egrep 'PartitionName=(batch|gpu) ' /etc/slurm/slurm.conf
PartitionName=batch Default=YES MinNodes=1 MaxNodes=40 DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=4000 AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=node[001-040]
PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=4000 AllowAccounts=ALL AllowQos=ALL LLN=NO MaxCPUsPerNode=16 QoS=gpu ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=gpunode[001-004]
Original submission specifying CPUs, time, GRES, QOS, and partition, which accepts jobs 1-8, and rejects job 9 even though I haven't set the DenyOnLimit flag:
$ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 --gres=gpu --qos=gpu --partition=gpu omp_hw.sh; done
Submitted batch job 150548
Submitted batch job 150549
Submitted batch job 150550
Submitted batch job 150551
Submitted batch job 150552
Submitted batch job 150553
Submitted batch job 150554
Submitted batch job 150555
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
$ scancel -u $USER -p gpu
Minimized down to just the specification for CPUs, time, and partition, same results, since the gpu QOS is automatically applied to jobs in the gpu partition:
$ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 --partition=gpu omp_hw.sh; done
Submitted batch job 150556
Submitted batch job 150557
Submitted batch job 150558
Submitted batch job 150559
Submitted batch job 150560
Submitted batch job 150561
Submitted batch job 150562
Submitted batch job 150563
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
$ scancel -u $USER -p gpu
Running in the batch partition with the normal QOS, all 9 jobs are accepted:
$ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 --partition=batch omp_hw.sh; done
Submitted batch job 150564
Submitted batch job 150565
Submitted batch job 150566
Submitted batch job 150567
Submitted batch job 150568
Submitted batch job 150569
Submitted batch job 150570
Submitted batch job 150571
Submitted batch job 150572
$ scancel -u $USER -p batch
Running in the batch partition with the gpu QOS explicitly specified, accepts jobs 1-8, and rejects job 9:
$ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 --partition=batch --qos=gpu omp_hw.sh; done
Submitted batch job 150573
Submitted batch job 150574
Submitted batch job 150575
Submitted batch job 150576
Submitted batch job 150577
Submitted batch job 150578
Submitted batch job 150579
Submitted batch job 150580
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
$ scancel -u $USER -p batch
So the behavior appears to be triggered by the gpu QOS. What might I have missed?
--
Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
931 372-3601 / Tennessee Tech University
More information about the slurm-users
mailing list