[slurm-users] Forcing CPU bindings

Sean Crosby richardnixonshead at gmail.com
Thu May 31 03:00:17 MDT 2018


Hi,

When a user requests all of the GPUs on a system, but less than the total
number of CPUs, the CPU bindings aren't ideal

[root at host ~]# nvidia-smi topo -m
GPU0 GPU1 GPU2 GPU3 mlx5_3 mlx5_1 mlx5_2 mlx5_0 CPU Affinity
GPU0 X PHB SYS SYS SYS PHB SYS PHB
0-0,2-2,4-4,6-6,8-8,10-10,12-12,14-14,16-16,18-18,20-20,22-22
GPU1 PHB X SYS SYS SYS PHB SYS PHB
0-0,2-2,4-4,6-6,8-8,10-10,12-12,14-14,16-16,18-18,20-20,22-22
GPU2 SYS SYS X PHB PHB SYS PHB SYS
1-1,3-3,5-5,7-7,9-9,11-11,13-13,15-15,17-17,19-19,21-21,23-23
GPU3 SYS SYS PHB X PHB SYS PHB SYS
1-1,3-3,5-5,7-7,9-9,11-11,13-13,15-15,17-17,19-19,21-21,23-23
mlx5_3 SYS SYS PHB PHB X SYS PIX SYS
mlx5_1 PHB PHB SYS SYS SYS X SYS PIX
mlx5_2 SYS SYS PHB PHB PIX SYS X SYS
mlx5_0 PHB PHB SYS SYS SYS PIX SYS X

$ cat /usr/local/slurm/etc/gres.conf
NodeName=host Name=gpu Type=p100 File=/dev/nvidia0
Cores=0,2,4,6,8,10,12,14,16,18,20,22
NodeName=host Name=gpu Type=p100 File=/dev/nvidia1
Cores=0,2,4,6,8,10,12,14,16,18,20,22
NodeName=host Name=gpu Type=p100 File=/dev/nvidia2
Cores=1,3,5,7,9,11,13,15,17,19,21,23
NodeName=host Name=gpu Type=p100 File=/dev/nvidia3
Cores=1,3,5,7,9,11,13,15,17,19,21,23

[scrosby at thespian ~]$ sinteractive -n 20 --gres=gpu:p100:4
srun: job 612 queued and waiting for resources
srun: job 612 has been allocated resources
[scrosby at host ~]$ cat
/sys/fs/cgroup/cpuset/slurm/uid_10255/job_612/cpuset.cpus
0-16,18,20,22

It should ideally be using CPUs 0-19 (split evenly across NUMA nodes).

I've tried forcing it with this

[scrosby at thespian ~]$ sinteractive -n 20 --gres=gpu:p100:4
--cpu_bind=map_cpu:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
srun: job 614 queued and waiting for resources
srun: job 614 has been allocated resources

But the resultant CPU binding is still the same

[scrosby at host ~]$ cat
/sys/fs/cgroup/cpuset/slurm/uid_10255/job_614/cpuset.cpus
0-16,18,20,22


Is there any way to force the CPU bindings of a particular job?

Cheers,
Sean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180531/ff787172/attachment.html>


More information about the slurm-users mailing list