[slurm-users] Gang scheduling using GPUs ?

Fri Dec 18 10:10:05 UTC 2020

Hello,

we are experiencing troubles with gang scheduling once GPUs are added in
the consideration. We are using the following slurm.conf settings:

ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup

SchedulerType=sched/backfill
SchedulerTimeSlice=60
SelectType=select/cons_tres
SelectTypeParameters=CR_CPU_Memory

PreemptType=preempt/qos
PreemptMode=SUSPEND, GANG
PreemptExemptTime=-1

NodeName=cn2 Sockets=4 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=56000
Gres=gpu:geforce_gtx_1080_ti:2
[...]
PartitionName=main Nodes=cn2,cn3,cn4 Default=YES MaxTime=INFINITE State=UP
OverSubscribe=FORCE:4
[...]
----------------------------------------------------------------------------

We use the QoS-based preemption to run lower priority tasks getting
pre-empted automatically when higher priority tasks arrive in the queue,
which works nicely. When we run several GPU tasks using sbatch with a
script as shown below, we see that these tasks don't get gang-scheduled,
without any apparent error message in the logs. For jobs involving only
CPUs it works as expected.
We didn't see any specific comments regarding GPUs in the gang scheduling
documentation - are we trying something which is not supported or are we
doing it wrong? Also, is there a way to obtain more detailed logs/insights
into how the system practically decides when to form a gang or not?

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --gpus=2
#SBATCH --partition=main,interactive

IMAGES_DIR="/path/to/images"
IMAGE="nvcr.io/nvidia/cuda:10.0-base"

srun --container-image="$IMAGES_DIR/$IMAGE.sqsh" bash ...
----------------------------------------------------------------------------

Thanks for reading & have a nice weekend
Tilman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201218/d0ae539d/attachment.htm>