[slurm-users] Config: behavior of default CPU number per GPU (DefCpuPerGPU)

GD gd.dev at libertymail.net
Wed Jul 10 13:35:55 UTC 2019


Hi,

I have a question regarding the default number of CPUs allocated per GPU
(`DefCpuPerGPU` in `slurm.conf`). I first mention that the doc refers to
`DefCpusPerGPU` (with an 's' at Cpu) but slurmctld only understand
`DefCpuPerGPU` (c.f. https://bugs.schedmd.com/show_bug.cgi?id=7203).

So here is what the `slurm.conf` doc states:
```
DefCpusPerGPU
    Default count of CPUs allocated per allocated GPU
```

Here is an extract of my `slurm.conf` file setting `DefCpuPerGPU=32`:
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres
FastSchedule=1
SelectTypeParameters=CR_CPU_Memory
PriorityType=priority/multifactor
PriorityFlags=CALCULATE_RUNNING,SMALL_RELATIVE_TO_TIME
PriorityFavorSmall=yes
DefMemPerCPU=2000
MaxMemPerCPU=2800
DefMemPerGPU=80000
DefCpuPerGPU=32

# COMPUTE NODES
GresTypes=gpu
NodeName=XXXX NodeAddr=XXXX Gres=gpu:rtx2080:2,gpu:gtx1080:1 Sockets=4
CoresPerSocket=16 ThreadsPerCore=2 RealMemory=376000 MemSpecLimit=10000
State=UNKNOWN
PartitionName=prod Nodes=XXXX OverSubscribe=YES Default=YES
MaxTime=INFINITE DefaultTime=2:0:0 State=UP
```

However, if I request a GPU, I only get 2 cores by default:
```
srun --gpus=1 --pty bash
$ taskset -c -p $$
pid 127735's current affinity list: 1,65
```

I use Slurm 19.05 on an ArchLinux machine and the `slurm-llnl` AUR
package, here are my `gres.conf` file and my system info:
- gres.conf
```
NodeName=XXXX Name=gpu Type=rtx2080  File=/dev/nvidia0 Cores=32-63
NodeName=XXXX Name=gpu Type=rtx2080  File=/dev/nvidia1 Cores=64-95
NodeName=XXXX Name=gpu Type=gtx1080  File=/dev/nvidia2 Cores=96-127
```
- System info
```
$ uname -a
Linux XXXX 5.1.15-arch1-1-ARCH #1 SMP PREEMPT Tue Jun 25 04:49:39 UTC
2019 x86_64 GNU/Linux
```

Thanks in advanced,
Best regards,
Ghislain Durif




More information about the slurm-users mailing list