[slurm-users] DenyOnLimit flag ignored for QOS, always rejects?

Renfro, Michael Renfro at tntech.edu
Fri Jan 25 17:37:17 UTC 2019


Resolved, thanks to Adam Hough on sighpcsyspros Slack.

Before, I had MaxSubmitJobsPerUser=8, and really wanted MaxJobsPerUser=8

- MaxJobsPerUser= The maximum number of jobs a user can have running at a given time.
- MaxSubmitJobsPerUser= The maximum number of jobs a user can have running and pending at a given time.

Now:

    $ sacctmgr list qos normal,gpu format=name,priority,gracetime,preemptmode,usagefactor,grptresrunmin,MaxSubmitJobsPerUser,maxjobsperuser,flags
          Name   Priority  GraceTime PreemptMode UsageFactor GrpTRESRunMin MaxSubmitPU MaxJobsPU                Flags
    ---------- ---------- ---------- ----------- ----------- ------------- ----------- --------- --------------------
           gpu          0   00:00:00     cluster    1.000000                                   8
        normal          0   00:00:00     cluster    1.000000

$ for n in $(seq 9); do sbatch --time=00:10:00 --partition=gpu omp_hw.sh; done
Submitted batch job 150670
Submitted batch job 150671
Submitted batch job 150672
Submitted batch job 150673
Submitted batch job 150674
Submitted batch job 150675
Submitted batch job 150676
Submitted batch job 150677
Submitted batch job 150678
[renfro at login hw]$ squeue -u $USER -p gpu
JOBID  PARTI       NAME     USER ST         TIME CPUS NODES MIN_MEMORY NODELIST(REASON) GRES
150678   gpu  omp_hw.sh   renfro PD         0:00 1    1     4000M      (QOSMaxJobsPerUs (null)
150670   gpu  omp_hw.sh   renfro  R         0:06 1    1     4000M      gpunode001       (null)
150671   gpu  omp_hw.sh   renfro  R         0:06 1    1     4000M      gpunode001       (null)
150672   gpu  omp_hw.sh   renfro  R         0:06 1    1     4000M      gpunode001       (null)
150673   gpu  omp_hw.sh   renfro  R         0:06 1    1     4000M      gpunode001       (null)
150674   gpu  omp_hw.sh   renfro  R         0:06 1    1     4000M      gpunode001       (null)
150675   gpu  omp_hw.sh   renfro  R         0:06 1    1     4000M      gpunode001       (null)
150676   gpu  omp_hw.sh   renfro  R         0:06 1    1     4000M      gpunode001       (null)
150677   gpu  omp_hw.sh   renfro  R         0:06 1    1     4000M      gpunode001       (null)
$ scancel -u $USER -p gpu

> On Jan 25, 2019, at 10:35 AM, Renfro, Michael <Renfro at tntech.edu> wrote:
> 
> Hey, folks. Running 17.02.10 with Bright Cluster Manager 8.0.
> 
> I wanted to limit queue-stuffing on my GPU nodes, similar to what AssocGrpCPURunMinutesLimit does. The current goal is to restrict a user to having 8 active or queued jobs in the production GPU partition, and block (not reject) other jobs to allow other users fair access to the queue. I'm good with a time limit instead of a job number limit, too.
> 
> I'd assumed a partition QOS was the way to go, as the sacctmgr man page reads in part:
> 
>    Flags  Used by the slurmctld to override or enforce certain characteristics.
>           Valid options are
> 
>           DenyOnLimit
>             If set, jobs using this QOS will be rejected at submission time if they do not conform to the QOS 'Max' limits. Group limits will also be treated like 'Max' limits as well and will be denied if they go over. By default jobs that go over these limits will pend until they conform. This currently only applies to QOS and Association limits.
> 
> So avoid setting the DenyOnLimit flag, and extra jobs will pend until they conform, right? My QOS settings for 8 active or pending GPU jobs per user are as follows:
> 
>    $ sacctmgr list qos normal,gpu format=name,priority,gracetime,preemptmode,usagefactor,grptresrunmin,MaxSubmitJobsPerUser,flags
>          Name   Priority  GraceTime PreemptMode UsageFactor GrpTRESRunMin MaxSubmitPU                Flags
>    ---------- ---------- ---------- ----------- ----------- ------------- ----------- --------------------
>        normal          0   00:00:00     cluster    1.000000
>           gpu          0   00:00:00     cluster    1.000000                         8
> 
> Partition settings, where the gpu QOS is applied to jobs in the gpu partition:
> 
>    $ egrep 'PartitionName=(batch|gpu) ' /etc/slurm/slurm.conf
>    PartitionName=batch Default=YES MinNodes=1 MaxNodes=40 DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=4000 AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=node[001-040]
>    PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=4000 AllowAccounts=ALL AllowQos=ALL LLN=NO MaxCPUsPerNode=16 QoS=gpu ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=gpunode[001-004]
> 
> Original submission specifying CPUs, time, GRES, QOS, and partition, which accepts jobs 1-8, and rejects job 9 even though I haven't set the DenyOnLimit flag:
> 
>    $ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 --gres=gpu --qos=gpu --partition=gpu omp_hw.sh; done
>    Submitted batch job 150548
>    Submitted batch job 150549
>    Submitted batch job 150550
>    Submitted batch job 150551
>    Submitted batch job 150552
>    Submitted batch job 150553
>    Submitted batch job 150554
>    Submitted batch job 150555
>    sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
>    $ scancel -u $USER -p gpu
> 
> Minimized down to just the specification for CPUs, time, and partition, same results, since the gpu QOS is automatically applied to jobs in the gpu partition:
> 
>    $ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 --partition=gpu omp_hw.sh; done
>    Submitted batch job 150556
>    Submitted batch job 150557
>    Submitted batch job 150558
>    Submitted batch job 150559
>    Submitted batch job 150560
>    Submitted batch job 150561
>    Submitted batch job 150562
>    Submitted batch job 150563
>    sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
>    $ scancel -u $USER -p gpu
> 
> Running in the batch partition with the normal QOS, all 9 jobs are accepted:
> 
>    $ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 --partition=batch omp_hw.sh; done
>    Submitted batch job 150564
>    Submitted batch job 150565
>    Submitted batch job 150566
>    Submitted batch job 150567
>    Submitted batch job 150568
>    Submitted batch job 150569
>    Submitted batch job 150570
>    Submitted batch job 150571
>    Submitted batch job 150572
>    $ scancel -u $USER -p batch
> 
> Running in the batch partition with the gpu QOS explicitly specified, accepts jobs 1-8, and rejects job 9:
> 
>    $ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 --partition=batch --qos=gpu omp_hw.sh; done
>    Submitted batch job 150573
>    Submitted batch job 150574
>    Submitted batch job 150575
>    Submitted batch job 150576
>    Submitted batch job 150577
>    Submitted batch job 150578
>    Submitted batch job 150579
>    Submitted batch job 150580
>    sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
>    $ scancel -u $USER -p batch
> 
> So the behavior appears to be triggered by the gpu QOS. What might I have missed?
> 
> -- 
> Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
> 931 372-3601     / Tennessee Tech University
> 
> 




More information about the slurm-users mailing list