[slurm-users] Help with binding GPUs to sockets (NVlink, P2P)
Luis Altenkort
altenkort at physik.uni-bielefeld.de
Fri Jun 28 09:29:00 UTC 2019
Hi,
thanks for the answer. We actually had this already set up correctly, I
simply forgot to add #SBATCH --sockets-per-node=1 to my script. Now
--gpus-per-socket works!
Am 28.06.19 um 09:27 schrieb Daniel Vecerka:
> Hi,
>
> I'm not sure how it works in 19.0.5, but with 18.x it's possible to
> specify CPU affinity in the file /etc/slurm/gres.conf
> Name=gpu Type=v100 File=/dev/nvidia0 CPUs=0-17,36-53
> Name=gpu Type=v100 File=/dev/nvidia1 CPUs=0-17,36-53
> Name=gpu Type=v100 File=/dev/nvidia2 CPUs=18-35,54-71
> Name=gpu Type=v100 File=/dev/nvidia3 CPUs=18-35,54-71
>
> The CPUs number you can get with command:
>
> nvidia-smi topo -m
> GPU0 GPU1 GPU2 GPU3 mlx5_0 CPU Affinity
> GPU0 X NV2 NV2 NV2 NODE 0-17,36-53
> GPU1 NV2 X NV2 NV2 NODE 0-17,36-53
> GPU2 NV2 NV2 X NV2 SYS 18-35,54-71
> GPU3 NV2 NV2 NV2 X SYS 18-35,54-71
> mlx5_0 NODE NODE SYS SYS X
>
>
> Best regards, Daniel
>
>
> On 27.06.2019 15:17, Luis Altenkort wrote:
>> Hello everyone,
>> I have several nodes with 2 sockets each and 4 GPUs per Socket (i.e.
>> 8 GPUs per bode). I now want to tell SLURM that GPUs with device ID
>> 0,1,2,3 are connected to socket 0 and GPUs 4,5,6,7 are connected to
>> socket 1. I want to do this in order to be able to use the new
>> command --gpus-per-socket. All GPUs on one socket are directly linked
>> via NVlink and use P2P communication. In the end I want to be able to
>> run multi-gpu jobs on GPUs that are all on one socket (and not
>> distributed accross sockets or nodes). What do I have to change in my
>> slurm.conf and how would I submit jobs? Like this?:
>> #!/bin/bash
>> #SBATCH --job-name=test
>> #SBATCH --partition=volta
>> #SBATCH --gpus-per-socket=4
>> #SBATCH --ntasks=1
>> #SBATCH --cpus-per-task=1
>>
>> Slurm is on version 19.05.0.
>>
>> Thanks in advance!
>>
>
More information about the slurm-users
mailing list