[slurm-users] Verbose mode of the 'accel-bind' does not work.
Uemoto, Tomoki
fj2770fj at aa.jp.fujitsu.com
Wed Nov 27 06:47:30 UTC 2019
Hi, all
OS Version: RHEL 7.6
SLURM Version: slurm 18.08.6
I defined the gpu resource as follows:
[test at ohpc137pbsop-c001 ~]$ scontrol show config |grep TaskPlugin
TaskPlugin = task/cgroup
TaskPluginParam = (null type)
[test at ohpc137pbsop-c001 ~]$
[test at ohpc137pbsop-c001 ~]$ grep Gres /etc/slurm/slurm.conf
GresTypes=gpu
NodeName=ohpc137pbsop-c001 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 Gres=gpu:2 State=IDLE
NodeName=ohpc137pbsop-c002 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 Gres=gpu:2 State=IDLE
[test at ohpc137pbsop-c001 ~]$
[test at ohpc137pbsop-c001 ~]$ cat /etc/slurm/gres.conf
Name=gpu File=/dev/tty0 Cores=0,1
Name=gpu File=/dev/tty1 Cores=0,1
[test at ohpc137pbsop-c001 ~]$
[root at ohpc137pbsop-sms ~]# cat /etc/slurm/cgroup.conf
###
#
# Slurm cgroup support configuration file
#
# See man slurm.conf and man cgroup.conf for further
# information on cgroup configuration parameters
#--
ConstrainCores=yes
TaskAffinity=yes
CgroupMountpoint=/cgroup
CgroupAutomount=yes
ConstrainRAMSpace=yes
[root at ohpc137pbsop-sms ~]#
[root at ohpc137pbsop-sms ~]# scontrol show node |grep Gres
Gres=gpu:2
Gres=gpu:2
[root at ohpc137pbsop-sms ~]#
And I executed the following script.
[test at ohpc137pbsop-sms ~]$ srun -l --gres=gpu:2 -n4 --accel-bind=v,g -l hostname
0: ohpc137pbsop-c001
2: ohpc137pbsop-c002
1: ohpc137pbsop-c001
3: ohpc137pbsop-c002
[test at ohpc137pbsop-sms ~]$ srun -l --gres=gpu:2 -n4 --accel-bind=v -l hostname
2: ohpc137pbsop-c002
0: ohpc137pbsop-c001
3: ohpc137pbsop-c002
1: ohpc137pbsop-c001
[test at ohpc137pbsop-sms ~]$
Task binding information is not output.
Is the verbose mode (of the accel-bind) not supported in this version(slurm 18.08.6)?
The verbose mode of cpu-bind was confirmed as follows.
[test at ohpc137pbsop-sms ~]$ srun -c1 --cpu-bind=v hostname
cpu-bind=NULL - ohpc137pbsop-c001, task 0 0 [22822]: mask 0x1000001
ohpc137pbsop-c001
cpu-bind=NULL - ohpc137pbsop-c001, task 1 1 [22823]: mask 0x1000001
ohpc137pbsop-c001
[test at ohpc137pbsop-sms ~]$
More information about the slurm-users
mailing list