[slurm-users] Limiting the number of CPU

Brian W. Johanson bjohanso at psc.edu
Fri Nov 8 13:58:40 UTC 2019


Suksmandhira,
That qos specifies a walltime, cpu, and memory limit.  From the job script, it appears you are within the cpu limit.  But, the job script does not specify walltime nor memory and your squeue output is not showing those values (or cpu) for the job.
'scontrol show job=JOBID' will show it all values.  Added flags=DenyOnLimit to the qos will reject the job when it is over the limit of a QOS, hopefully so there are not jobs that will never run sitting in queue.

-b

On 11/7/19 9:37 PM, Sukman wrote:
> Hi all,
>
> I am currently having a problem in limiting the number of CPU used for running a job.
> I tried to limit the CPU to just only 2 from the maximum 56.
> But, when I run the job, using only 1 CPU, the QOS has been reached already.
> When I set the CPU to 56, the job runs finely.
>
> Does anyone have any suggestion regarding this problem?
>
>
> Following is the details of the problem.
>
>
> My node has 56 cores (2sockets x 28cores).
>
>
> I configured already slurm.conf by enabling the qos/limit enforcement.
>
> #slurm.conf
> AccountingStorageEnforce=qos,limits
>
>
> For QOS itself, I just tried applying a simple limit-CPU number to be 2.
>
> #QOS
> sacctmgr show qos where Name=normal_compute format=Name,Priority,UsageFactor,MaxWall,MaxTRESPU
>        Name   Priority UsageFactor     MaxWall     MaxTRESPU
> ---------- ---------- ----------- ----------- -------------
> normal_co+         10    1.000000    00:01:00  cpu=2,mem=1G
>
>
> I then applied the QOS to a specific user, sukman.
>
> #QOS-defined user
> sacctmgr list association where User=sukman format=User,QOS,
>        User                  QOS
> ---------- --------------------
>      sukman       normal_compute
>
>
> Then, I tried to run a simple bash command, hostname, by just using 1 node, 1 task, and 1 CPU
>
> #!/bin/bash
> #SBATCH --job-name=hostname
> #SBATCH --nodes=1
> #SBATCH --ntasks=1
> #SBATCH --ntasks-per-node=1
> #SBATCH --cpus-per-task=1
> #SBATCH --nodelist=cn110
>
> srun hostname
>
>
> However, the QOS has been reached already.
>
> squeue
>               JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
>                  68      defq hostname   sukman PD       0:00      1 (QOSMaxCpuPerUserLimit)
>
>
> When I change the CPU limit to the max cores number in a server, 56 cores
>
> sacctmgr show qos where Name=normal_compute format=Name,Priority,UsageFactor,MaxWall,MaxTRESPU
>        Name   Priority UsageFactor     MaxWall     MaxTRESPU
> ---------- ---------- ----------- ----------- -------------
> normal_co+         10    1.000000    00:01:00 cpu=56,mem=1G
>
>
> the script runs perfectly.
>
> cat slurm-68.out
> cn110
>
>
>
> --------------------------------
>
> Suksmandhira H
> ITB Indonesia




More information about the slurm-users mailing list