[slurm-users] Limiting the number of CPU

Sukman sukman at pusat.itb.ac.id
Fri Nov 8 02:37:10 UTC 2019


Hi all,

I am currently having a problem in limiting the number of CPU used for running a job.
I tried to limit the CPU to just only 2 from the maximum 56.
But, when I run the job, using only 1 CPU, the QOS has been reached already.
When I set the CPU to 56, the job runs finely.

Does anyone have any suggestion regarding this problem?


Following is the details of the problem.


My node has 56 cores (2sockets x 28cores).


I configured already slurm.conf by enabling the qos/limit enforcement.

#slurm.conf
AccountingStorageEnforce=qos,limits


For QOS itself, I just tried applying a simple limit-CPU number to be 2.

#QOS
sacctmgr show qos where Name=normal_compute format=Name,Priority,UsageFactor,MaxWall,MaxTRESPU
      Name   Priority UsageFactor     MaxWall     MaxTRESPU 
---------- ---------- ----------- ----------- ------------- 
normal_co+         10    1.000000    00:01:00  cpu=2,mem=1G


I then applied the QOS to a specific user, sukman.

#QOS-defined user
sacctmgr list association where User=sukman format=User,QOS,
      User                  QOS 
---------- -------------------- 
    sukman       normal_compute


Then, I tried to run a simple bash command, hostname, by just using 1 node, 1 task, and 1 CPU

#!/bin/bash
#SBATCH --job-name=hostname
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --nodelist=cn110

srun hostname


However, the QOS has been reached already.

squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                68      defq hostname   sukman PD       0:00      1 (QOSMaxCpuPerUserLimit)


When I change the CPU limit to the max cores number in a server, 56 cores

sacctmgr show qos where Name=normal_compute format=Name,Priority,UsageFactor,MaxWall,MaxTRESPU
      Name   Priority UsageFactor     MaxWall     MaxTRESPU 
---------- ---------- ----------- ----------- ------------- 
normal_co+         10    1.000000    00:01:00 cpu=56,mem=1G


the script runs perfectly.

cat slurm-68.out 
cn110



--------------------------------

Suksmandhira H
ITB Indonesia



More information about the slurm-users mailing list