[slurm-users] srun and --cpus-per-task

Thu Mar 24 14:55:35 UTC 2022

Hi Durai,

I see the same thing as you on our test-cluster that has
ThreadsPerCore=2
configured in slurm.conf

The double-foo goes away with this:
srun --cpus-per-task=1 --hint=nomultithread echo foo

Having multithreading enabled leads to imho surprising behaviour of 
Slurm. My impression is that using it makes the concept of "a CPU" in 
Slurm somewhat fuzzy. It becomes unclear and ambiguous what you get when 
using the cpu-related options of srun, sbatch and salloc: is it a 
CPU-core or is it a CPU-thread?

I think what you found is a bug.

If you run

for c in {4..1}
do
  echo "## $c ###"
  srun -c $c bash -c 'echo $SLURM_CPU_BIND_LIST'
done

you will get:

## 4 ###
0x003003
## 3 ###
0x003003
## 2 ###
0x001001
## 1 ###
0x000001,0x001000
0x000001,0x001000

You see: requesting 4 and 3 CPUs results in the same cpu-binding as both 
need two CPU-cores with 2 threads each. In the "3" case one of it stays 
unused but of course is not free for another job.
In the "1" case I would expect to see the same binding as in the "2" 
case. If you combine the two values in the list you *do* get the same 
value but obviously it's a list of two values and this might be the 
origin of the problem.

It is probably related to what's mentioned in the documentation for 
'--ntasks':
"[...] The default is one task per node, but note that the 
--cpus-per-task option will change this default."

Regards
Hermann

On 3/24/22 1:37 PM, Durai Arasan wrote:
> Hello Slurm users,
> 
> We are experiencing strange behavior with srun executing commands twice 
> only when setting --cpus-per-task=1
> 
> $ srun --cpus-per-task=1 --partition=gpu-2080ti echo foo
> srun: job 1298286 queued and waiting for resources
> srun: job 1298286 has been allocated resources
> foo
> foo
> 
> This is not seen when --cpus-per-task is another value:
> 
> $ srun --cpus-per-task=3 --partition=gpu-2080ti echo foo
> srun: job 1298287 queued and waiting for resources
> srun: job 1298287 has been allocated resources
> foo
> 
> Also when specifying --ntasks:
> $ srun -n1 --cpus-per-task=1 --partition=gpu-2080ti echo foo
> srun: job 1298288 queued and waiting for resources
> srun: job 1298288 has been allocated resources
> foo
> 
> Relevant slurm.conf settings are:
> SelectType=select/cons_tres
> SelectTypeParameters=CR_Core_Memory
> # example node configuration
> NodeName=slurm-bm-58 NodeAddr=xxx.xxx.xxx.xxx Procs=72 Sockets=2 
> CoresPerSocket=18 ThreadsPerCore=2 RealMemory=354566 
> Gres=gpu:rtx2080ti:8 Feature=xx_v2.38 State=UNKNOWN
> 
> On closer of job variables in the "--cpus-per-task=1" case, the 
> following variables have wrongly acquired a value of 2 for no reason:
> SLURM_NTASKS=2
> SLURM_NPROCS=2
> SLURM_TASKS_PER_NODE=2
> SLURM_STEP_NUM_TASKS=2
> SLURM_STEP_TASKS_PER_NODE=2
> 
> Can you see what could be wrong?
> 
> Best,
> Durai