[slurm-users] Slurm -- using GPU cards with NVLINK
David Baker
D.J.Baker at soton.ac.uk
Fri Sep 11 07:37:51 UTC 2020
Hi Ryan,
Thank you very much for your reply. That is useful. We'll see how we get on.
Best regards,
David
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Ryan Novosielski <novosirj at rutgers.edu>
Sent: 11 September 2020 00:08
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Slurm -- using GPU cards with NVLINK
I’m fairly sure that you set this up the same way you set up for a peer-to-peer setup. Here’s ours:
[root at cuda001 ~]# nvidia-smi topo --matrix
GPU0 GPU1 GPU2 GPU3 mlx4_0 CPU Affinity
GPU0 X PIX SYS SYS PHB 0-11
GPU1 PIX X SYS SYS PHB 0-11
GPU2 SYS SYS X PIX SYS 12-23
GPU3 SYS SYS PIX X SYS 12-23
mlx4_0 PHB PHB SYS SYS X
[root at cuda001 ~]# cat /etc/slurm/gres.conf
…
# 2 x K80 (perceval)
NodeName=cuda[001-008] Name=gpu File=/dev/nvidia[0-1] CPUs=0-11
NodeName=cuda[001-008] Name=gpu File=/dev/nvidia[2-3] CPUs=12-23
This also seems to be related:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2FSLUG19%2FGPU_Scheduling_and_Cons_Tres.pdf&data=01%7C01%7Cd.j.baker%40soton.ac.uk%7C1a052163da5d4d0643d808d855ded053%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=lV2AExQxAc7svAT2FNJHJ8TsU5pfix0GwjpQ29Cc%2B0A%3D&reserved=0
--
____
|| \\UTGERS, |---------------------------*O*---------------------------
||_// the State | Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
|| \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark
`'
> On Sep 10, 2020, at 11:00 AM, David Baker <D.J.Baker at soton.ac.uk> wrote:
>
> Hello,
>
> We are installing a group of nodes which all contain 4 GPU cards. The GPUs are paired together using NVLINK as described in the matrix below.
>
> We are familiar with using Slurm to schedule and run jobs on GPU cards, but this is the first time we have dealt with NVLINK enabled GPUs. Could someone please advise us how to configure Slurm so that we can submit jobs to the cards and make use of the NVLINK? That is, what do we need to put in the gres.conf or slurm.conf, and how should users use the sbatch command? I presume, for example, that a user could make use of a GPU card, and potentially make use of memory on the paired card.
>
> Best regards,
> David
>
> [root at alpha51 ~]# nvidia-smi topo --matrix
> GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity
> GPU0 X NV2 SYS SYS 0,2,4,6,8,10 0
> GPU1 NV2 X SYS SYS 0,2,4,6,8,10 0
> GPU2 SYS SYS X NV2 1,3,5,7,9,11 1
> GPU3 SYS SYS NV2 X 1,3,5,7,9,11 1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200911/ea3a04cd/attachment.htm>
More information about the slurm-users
mailing list