[slurm-users] Problem with configuration CPU/GPU partitions

Fri Feb 28 03:51:26 UTC 2020

Hello,

I have a hybrid cluster with 2 GPUs and 2 20-cores CPUs on each node.

I created two partitions: - "cpu" for CPU-only jobs which are allowed to
allocate up to 38 cores per node - "gpu" for GPU-only jobs which are
allowed to allocate up to 2 GPUs and 2 CPU cores.

Respective sections in slurm.conf:

# NODES
NodeName=node[01-06] Sockets=2 CoresPerSocket=20 ThreadsPerCore=1
Gres=gpu:2(S:0-1) RealMemory=257433

# PARTITIONS
PartitionName=cpu Default=YES Nodes=node[01-06] MaxNodes=6 MinNodes=0
DefaultTime=04:00:00 MaxTime=14-00:00:00 MaxCPUsPerNode=38
PartitionName=gpu             Nodes=node[01-06] MaxNodes=6 MinNodes=0
DefaultTime=04:00:00 MaxTime=14-00:00:00 MaxCPUsPerNode=2

and in gres.conf:
Name=gpu Type=v100 File=/dev/nvidia0 Cores=0-19
Name=gpu Type=v100 File=/dev/nvidia1 Cores=20-39

However, it seems to be not working properly. If I first submit GPU job
using all available in "gpu" partition resources and then CPU job
allocating the rest of the CPU cores (i.e. 38 cores per node) in "cpu"
partition, it works perfectly fine. Both jobs start running. But if I
change the submission order and start CPU-job before GPU-job,  the "cpu"
job starts running while the "gpu" job stays in queue with PENDING
status and RESOURCES reason.

My first guess was that "cpu" job allocates cores assigned to respective
GPUs in gres.conf and prevents the GPU devices from running. However, it
seems not to be the case, because 37 cores job per node instead of 38
solves the problem.

Another thought was it has something to do with the specialized cores
reservation, but I tried to change CoreSpecCount option without success.

So, any ideas how to fix this behavior and where should look?

Thanks!