[slurm-users] Problem: requesting specific GPU GRES type is ignored in job submission

Wed Jul 10 09:10:03 UTC 2019

Hi,

I have an issue with GPU request in job submission. I have a single
computing node (128 cores, 3GPUs) which also runs the Slurm server.

When I try to submit a job requesting a specific GPU type corresponding
to a GTX 1080 (GPU id 2 on my machine), the job is not assigned to the
requested GPU (c.f. Example 1 below).

However, if no other GPU is available, a job can be assigned to the GPU
in question (c.f. Example 2).

Example 1: no resource are taken, requesting a specific GPU type
"gtx1080" and getting an RTX 2080 (not working)
```
srun --gpus=gtx1080:1 --pty bash
$ nvidia-smi -L
GPU 0: GeForce RTX 2080 Ti (UUID: GPU-a80xxxxxxxxxx)
```

Example 2: filling GPUs in ascending order, the first job gets GPUs 0
and 1 (two RTX 2080), the second gets GPU 2 (working as expected)
```
# terminal 1
srun --gpus=2 --pty bash
$ nvidia-smi -L
GPU 0: GeForce RTX 2080 Ti (UUID: GPU-a80xxxxxxxxxx)
GPU 1: GeForce RTX 2080 Ti (UUID: GPU-d63xxxxxxxxxx)

# terminal 2
srun --gpus=1 --pty bash
$ nvidia-smi -L
GPU 0: GeForce GTX 1080 (UUID: GPU-f58xxxxxxxxxx)
```

I use Slurm 19.05 on an ArchLinux machine (version below) and the
`slurm-llnl` AUR package.

slurm.conf (c.f. file attached)
```
# COMPUTE NODES
GresTypes=gpu
NodeName=XXXX NodeAddr=XXXX Gres=gpu:rtx2080:2,gpu:gtx1080:1 Sockets=4
CoresPerSocket=16 ThreadsPerCore=2 RealMemory=376000 MemSpecLimit=10000
State=UNKNOWN
PartitionName=prod Nodes=XXXX OverSubscribe=YES Default=YES
MaxTime=INFINITE DefaultTime=2:0:0 State=UP
```

gres.conf (c.f. file attached)
```
NodeName=XXXX Name=gpu Type=rtx2080  File=/dev/nvidia0 Cores=32-63
NodeName=XXXX Name=gpu Type=rtx2080  File=/dev/nvidia1 Cores=64-95
NodeName=XXXX Name=gpu Type=gtx1080  File=/dev/nvidia2 Cores=96-127
```

System info
```
$ uname -a
Linux XXXX 5.1.15-arch1-1-ARCH #1 SMP PREEMPT Tue Jun 25 04:49:39 UTC
2019 x86_64 GNU/Linux
```

Thanks in advanced

Best regards,
Ghislain Durif

-------------- next part --------------
NodeName=XXXX Name=gpu Type=rtx2080  File=/dev/nvidia0 Cores=32-63
NodeName=XXXX Name=gpu Type=rtx2080  File=/dev/nvidia1 Cores=64-95
NodeName=XXXX Name=gpu Type=gtx1080  File=/dev/nvidia2 Cores=96-127
-------------- next part --------------
#
# Example slurm.conf file. Please run configurator.html
# (in doc/html) to build a configuration file customized
# for your environment.
#
#
# slurm.conf file generated by configurator.html.
#
# See the slurm.conf man page for more information.
#
ClusterName=YYYY
ControlMachine=XXXX
#ControlAddr=
#BackupController=
#BackupAddr=
#
SlurmUser=slurm
#SlurmdUser=root
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
StateSaveLocation=/var/spool/slurm/ctld
SlurmdSpoolDir=/var/spool/slurm/d
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/pgid
#PluginDir=
#FirstJobId=
ReturnToService=0
#MaxJobCount=
#PlugStackConfig=
#PropagatePrioProcess=
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#Prolog=
#Epilog=
#SrunProlog=
#SrunEpilog=
#TaskProlog=
#TaskEpilog=
TaskPlugin=task/cgroup
#TrackWCKey=no
#TreeWidth=50
#TmpFS=
#UsePAM=
#
# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#
# SCHEDULING
SchedulerType=sched/backfill
#SchedulerAuth=
#SelectType=select/linear
SelectType=select/cons_tres
FastSchedule=1
SelectTypeParameters=CR_CPU_Memory
#PriorityType=priority/multifactor
#PriorityDecayHalfLife=14-0
#PriorityUsageResetPeriod=14-0
#PriorityWeightFairshare=100000
#PriorityWeightAge=1000
#PriorityWeightPartition=10000
#PriorityWeightJobSize=1000
#PriorityMaxAge=1-0
PriorityType=priority/multifactor
PriorityFlags=CALCULATE_RUNNING,SMALL_RELATIVE_TO_TIME
PriorityFavorSmall=yes
DefMemPerCPU=2000
MaxMemPerCPU=2800
DefMemPerGPU=80000
DefCpuPerGPU=32
#
# LOGGING
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurmd.log
JobCompType=jobcomp/none
#JobCompLoc=
#
# ACCOUNTING
#JobAcctGatherType=jobacct_gather/linux
#JobAcctGatherFrequency=30
#
#AccountingStorageType=accounting_storage/slurmdbd
#AccountingStorageHost=
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStorageUser=
#
# COMPUTE NODES
GresTypes=gpu
NodeName=XXXX NodeAddr=XXXX Gres=gpu:rtx2080:2,gpu:gtx1080:1 Sockets=4 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=376000 MemSpecLimit=10000 State=UNKNOWN
PartitionName=prod Nodes=XXXX OverSubscribe=YES Default=YES MaxTime=INFINITE DefaultTime=2:0:0 State=UP