<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">

<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>

</head>

<body dir="ltr">

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Hi Ryan,</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Thank you very much for your reply. That is useful. We'll see how we get on.</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Best regards,</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

David</div>

<div id="appendonsend"></div>

<hr style="display:inline-block;width:98%" tabindex="-1">

<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Ryan Novosielski <novosirj@rutgers.edu><br>

<b>Sent:</b> 11 September 2020 00:08<br>

<b>To:</b> Slurm User Community List <slurm-users@lists.schedmd.com><br>

<b>Subject:</b> Re: [slurm-users] Slurm -- using GPU cards with NVLINK</font>

<div> </div>

</div>

<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">

<div class="PlainText">I’m fairly sure that you set this up the same way you set up for a peer-to-peer setup. Here’s ours:<br>

<br>

[root@cuda001 ~]# nvidia-smi topo --matrix<br>

        GPU0    GPU1    GPU2    GPU3    mlx4_0  CPU Affinity<br>

GPU0     X      PIX     SYS     SYS     PHB     0-11<br>

GPU1    PIX      X      SYS     SYS     PHB     0-11<br>

GPU2    SYS     SYS      X      PIX     SYS     12-23<br>

GPU3    SYS     SYS     PIX      X      SYS     12-23<br>

mlx4_0  PHB     PHB     SYS     SYS      X <br>

<br>

[root@cuda001 ~]# cat /etc/slurm/gres.conf <br>

<br>

…<br>

<br>

# 2 x K80 (perceval)<br>

NodeName=cuda[001-008] Name=gpu File=/dev/nvidia[0-1] CPUs=0-11<br>

NodeName=cuda[001-008] Name=gpu File=/dev/nvidia[2-3] CPUs=12-23<br>

<br>

This also seems to be related:<br>

<br>

<a href="https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2FSLUG19%2FGPU_Scheduling_and_Cons_Tres.pdf&amp;data=01%7C01%7Cd.j.baker%40soton.ac.uk%7C1a052163da5d4d0643d808d855ded053%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=lV2AExQxAc7svAT2FNJHJ8TsU5pfix0GwjpQ29Cc%2B0A%3D&amp;reserved=0">https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2FSLUG19%2FGPU_Scheduling_and_Cons_Tres.pdf&amp;data=01%7C01%7Cd.j.baker%40soton.ac.uk%7C1a052163da5d4d0643d808d855ded053%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=lV2AExQxAc7svAT2FNJHJ8TsU5pfix0GwjpQ29Cc%2B0A%3D&amp;reserved=0</a><br>

<br>

--<br>

____<br>

|| \\UTGERS,      |---------------------------*O*---------------------------<br>

||_// the State  |         Ryan Novosielski - novosirj@rutgers.edu<br>

|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus<br>

||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark<br>

     `'<br>

<br>

> On Sep 10, 2020, at 11:00 AM, David Baker <D.J.Baker@soton.ac.uk> wrote:<br>

> <br>

> Hello,<br>

> <br>

> We are installing a group of nodes which all contain 4 GPU cards. The GPUs are paired together using NVLINK as described in the matrix below.

<br>

> <br>

> We are familiar with using Slurm to schedule and run jobs on GPU cards, but this is the first time we have dealt with NVLINK enabled GPUs. Could someone please advise us how to configure Slurm so that we can submit jobs to the cards and make use of the NVLINK?

 That is, what do we need to put in the gres.conf or slurm.conf, and how should users use the sbatch command? I presume, for example, that a user could make use of a GPU card, and potentially make use of memory on the paired card.<br>

> <br>

> Best regards,<br>

> David<br>

> <br>

> [root@alpha51 ~]# nvidia-smi topo --matrix<br>

>         GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity<br>

> GPU0     X      NV2     SYS     SYS     0,2,4,6,8,10    0<br>

> GPU1    NV2      X      SYS     SYS     0,2,4,6,8,10    0<br>

> GPU2    SYS     SYS      X      NV2     1,3,5,7,9,11    1<br>

> GPU3    SYS     SYS     NV2      X      1,3,5,7,9,11    1<br>

<br>

</div>

</span></font></div>

</body>

</html>