[slurm-users] Using GRES to manage GPUs, but unable to assign specific CPUs to specific GPUs

Tue Sep 18 08:33:50 MDT 2018

Thanks Julie!  Figured I was missing something.

-Randy

On Mon, Sep 17, 2018 at 8:52 PM Julie Bernauer <jbernauer at nvidia.com> wrote:

> Hi Randy,
>
> This is expected on an HT machine, like on the one described below.  If
> you run lstopo, you see:
>       L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
>         PU L#10 (P#5)
>         PU L#11 (P#45)
> Slurm uses the logical cores so 10 and 11 gives you "physical" cores 5 and
> 45.
>
> Julie
>
>
>
> ------------------------------
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
> Randall Radmer <radmer at gmail.com>
> *Sent:* Wednesday, September 12, 2018 10:14 AM
> *To:* slurm-users at lists.schedmd.com
> *Subject:* [slurm-users] Using GRES to manage GPUs, but unable to assign
> specific CPUs to specific GPUs
>
> I’m using GRES to manage eight GPUs in a node on a new Slurm cluster and
> am trying to bind specific CPUs to specific GPUs, but it’s not working as I
> expected.
>
> I am able to request a specific number of GPUs, but the CPU assignment
> seems wrong.
>
> I assume I’m missing something obvious, but just can't find it.  Any
> suggestion for how to fix it, or how to better investigate the problem,
> would be much appreciated.
>
> Example srun requesting one GPU follows:
> $ srun -p dgx1 --gres=gpu:1 --pty $SHELL
> [node-01:~]$ nvidia-smi --query-gpu=index,name --format=csv
> index, name
> 0, Tesla V100-SXM2-16GB
> [node-01:~]$ cat /sys/fs/cgroup/cpuset/slurm/uid_*/job_*/cpuset.cpus
> 5,45
>
> Similar example requesting eight GPUs follows:
> $ srun -p dgx1 --gres=gpu:8 --pty $SHELL
> [node-01:~]$ nvidia-smi --query-gpu=index,name --format=csv
> index, name
> 0, Tesla V100-SXM2-16GB
> 1, Tesla V100-SXM2-16GB
> 2, Tesla V100-SXM2-16GB
> 3, Tesla V100-SXM2-16GB
> 4, Tesla V100-SXM2-16GB
> 5, Tesla V100-SXM2-16GB
> 6, Tesla V100-SXM2-16GB
> 7, Tesla V100-SXM2-16GB
> [node-01:~]$ cat /sys/fs/cgroup/cpuset/slurm/uid_*/job_*/cpuset.cpus
> 5,45
>
> The machines are all Ubuntu 16.04 and Slurm version is 17.11.9-2.
>
> The /etc/slurm/gres.conf file follows:
> [node-01:~]$ less /etc/slurm/gres.conf
> Name=gpu Type=V100 File=/dev/nvidia0 Cores=10-11
> Name=gpu Type=V100 File=/dev/nvidia1 Cores=12-13
> Name=gpu Type=V100 File=/dev/nvidia2 Cores=14-15
> Name=gpu Type=V100 File=/dev/nvidia3 Cores=16-17
> Name=gpu Type=V100 File=/dev/nvidia4 Cores=18-19
> Name=gpu Type=V100 File=/dev/nvidia5 Cores=20-21
> Name=gpu Type=V100 File=/dev/nvidia6 Cores=22-23
> Name=gpu Type=V100 File=/dev/nvidia7 Cores=24-25
>
> The /etc/slurm/slurm.conf file on all machines in the cluster follows
> (with minor cleanup):
> ClusterName=testcluster
> ControlMachine=slurm-master
> SlurmUser=slurm
> SlurmctldPort=6817
> SlurmdPort=6818
> AuthType=auth/munge
> SlurmdSpoolDir=/var/spool/slurm/d
> SwitchType=switch/none
> MpiDefault=none
> SlurmctldPidFile=/var/run/slurmctld.pid
> SlurmdPidFile=/var/run/slurmd.pid
> ProctrackType=proctrack/cgroup
> PluginDir=/usr/lib/slurm
> ReturnToService=2
> Prolog=/etc/slurm/slurm.prolog
> PrologSlurmctld=/etc/slurm/slurm.ctld.prolog
> Epilog=/etc/slurm/slurm.epilog
> EpilogSlurmctld=/etc/slurm/slurm.ctld.epilog
> TaskProlog=/etc/slurm/slurm.task.prolog
> TaskPlugin=task/affinity,task/cgroup
> SlurmctldTimeout=300
> SlurmdTimeout=300
> InactiveLimit=0
> MinJobAge=300
> KillWait=20
> Waittime=0
> SchedulerType=sched/backfill
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
> FastSchedule=0
> DebugFlags=CPU_Bind,gres
> SlurmctldDebug=debug5
> SlurmctldLogFile=/var/log/slurm/slurmctld.log
> SlurmdDebug=3
> SlurmdLogFile=/var/log/slurm/slurmd.log
> JobCompType=jobcomp/filetxt
> JobCompLoc=/data/slurm/job_completions.log
> AccountingStorageType=accounting_storage/slurmdbd
> AccountingStorageLoc=/data/slurm/accounting_storage.log
> AccountingStorageEnforce=associations,limits,qos
> AccountingStorageTRES=gres/gpu,gres/gpu:V100
> PreemptMode=SUSPEND,GANG
> PrologFlags=Serial,Alloc
> RebootProgram="/sbin/shutdown -r 3"
> PreemptType=preempt/partition_prio
> CacheGroups=0
> DefMemPerCPU=2048
> GresTypes=gpu
> NodeName=node-01 State=UNKNOWN \
>                  Sockets=2 CoresPerSocket=20 ThreadsPerCore=2 \
>                  Gres=gpu:V100:8
> PartitionName=all Nodes=node-01 \
>                   Default=YES MaxTime=4:0:0 DefaultTime=4:0:0 State=UP
>
>
> Thanks,
> Randy
>
> ------------------------------
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180918/1ce5a5c9/attachment.html>