[slurm-users] Help with binding GPUs to sockets (NVlink, P2P)

Luis Altenkort altenkort at physik.uni-bielefeld.de
Thu Jun 27 13:15:43 UTC 2019


Hello everyone,
I have several nodes with 2 sockets each and 4 GPUs per Socket (i.e. 8 
GPUs per bode). I now want to tell SLURM that GPUs with device ID 
0,1,2,3 are connected to socket 0 and GPUs 4,5,6,7 are connected to 
socket 1. I want to do this in order to be able to use the new command 
--gpus-per-socket. All GPUs on one socket are directly linked via NVlink 
and use P2P communication. In the end I want to be able to run multi-gpu 
jobs on GPUs that are all on one socket (and not distributed accross 
sockets or nodes). What do I have to change in my slurm.conf and how 
would I submit jobs? Like this?:
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --partition=volta
#SBATCH --gpus-per-socket=4
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1

Slurm is on version 19.05.0.

Thanks in advance!



More information about the slurm-users mailing list