[slurm-users] gres:gpu managment

Thu May 23 07:54:17 UTC 2019

> Hello,
>
>  we are running 18.08.6 and has problems with GRES GPU management. 
> There is "gpu" partition with 12 nodes each with 4 Tesla V100 cards. An 
> allocation of the GPUs is working, GPU management for sbatch/srun jobs 
> is working too - CUDA_VISIBLE_DEVICES is correctly set according 
> --gres=gpu:x option. But we have problems with GPU management for job 
> steps. If I'll try this example:
>
> #!/bin/bash
> #
> # gres_test.bash
> # Submit as follows:
> # sbatch -p gpu --gres=gpu:4 -n4 gres_test.bash
> #
> echo JOB $SLURM_JOB_ID CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES
> srun --gres=gpu:1 -n1 --exclusive show_device.sh &
> srun --gres=gpu:1 -n1 --exclusive show_device.sh &
> srun --gres=gpu:1 -n1 --exclusive show_device.sh &
> srun --gres=gpu:1 -n1 --exclusive show_device.sh &
> wait
>
> cat show_devices.sh
> #!/bin/bash
> echo JOB $SLURM_JOB_ID STEP $SLURM_STEP_ID 
> CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES
>
>
> I'll get:
> JOB 49614 CUDA_VISIBLE_DEVICES=0,1,2,3
> JOB 49614 STEP 0 CUDA_VISIBLE_DEVICES=0
> JOB 49614 STEP 1 CUDA_VISIBLE_DEVICES=0
> JOB 49614 STEP 2 CUDA_VISIBLE_DEVICES=0
> JOB 49614 STEP 3 CUDA_VISIBLE_DEVICES=0
>
> But according: https://slurm.schedmd.com/gres.html I'm expecting:
>
> JOB 49614 CUDA_VISIBLE_DEVICES=0,1,2,3
> JOB 49614 STEP 0 CUDA_VISIBLE_DEVICES=0
> JOB 49614 STEP 1 CUDA_VISIBLE_DEVICES=1
> JOB 49614 STEP 2 CUDA_VISIBLE_DEVICES=2
> JOB 49614 STEP 3 CUDA_VISIBLE_DEVICES=3
>
> So we are not able distribute jobs to different GPUs inside sbatch . We 
> can use some wrapper like this:
>
> #!/bin/bash
> export CUDA_VISIBLE_DEVICES=$SLURM_STEPID
> my_job
>
> but SLURM built-in solution is better and more robust.
>
> GRES section of slurm.conf
>
> AccountingStorageTRES=gres/gpu
> JobAcctGatherType=jobacct_gather/cgroup
> GresTypes=gpu
> NodeName=n[21-32] Gres=gpu:v100:4 Sockets=2 CoresPerSocket=18 
> ThreadsPerCore=2 RealMemory=384000 TmpDisk=150000 State=UNKNOWN Weight=1000
> PartitionName=gpu Nodes=n[21-32] Default=NO MaxTime=24:00:00 State=UP 
> Priority=5 PriorityTier=15 OverSubscribe=FORCE
>
>
> /etc/slurm/gres.conf
> Name=gpu Type=v100 File=/dev/nvidia0 CPUs=0-17,36-53
> Name=gpu Type=v100 File=/dev/nvidia1 CPUs=0-17,36-53
> Name=gpu Type=v100 File=/dev/nvidia2 CPUs=18-35,54-71
> Name=gpu Type=v100 File=/dev/nvidia3 CPUs=18-35,54-71
>
> Any help appreciated.
>
> Thanks, Daniel Vecerka CTU Prague

Do jobs actually end up on the same GPU though? cgroups will always
refer to the first allocated GPU as 0, so it is not unexpected for each
job have CUDA_VISIBLE_DEVICES set to 0. Make sure you have the following
in /etc/cgroup.conf

   ConstrainDevices=yes

Aaron

-- 
Aaron Jackson, Research Associate
Computer Vision Lab, University of Nottingham
http://aaronsplace.co.uk