[slurm-users] Verbose mode of the 'accel-bind' does not work.

Uemoto, Tomoki fj2770fj at aa.jp.fujitsu.com
Wed Nov 27 06:47:30 UTC 2019


Hi, all

OS Version: RHEL 7.6
SLURM Version: slurm 18.08.6

I defined the gpu resource as follows:

  [test at ohpc137pbsop-c001 ~]$ scontrol show config |grep TaskPlugin
  TaskPlugin              = task/cgroup
  TaskPluginParam         = (null type)
  [test at ohpc137pbsop-c001 ~]$
  
  [test at ohpc137pbsop-c001 ~]$ grep Gres /etc/slurm/slurm.conf
  GresTypes=gpu
  NodeName=ohpc137pbsop-c001 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 Gres=gpu:2 State=IDLE
  NodeName=ohpc137pbsop-c002 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 Gres=gpu:2 State=IDLE
  [test at ohpc137pbsop-c001 ~]$

  [test at ohpc137pbsop-c001 ~]$ cat /etc/slurm/gres.conf
  Name=gpu File=/dev/tty0 Cores=0,1
  Name=gpu File=/dev/tty1 Cores=0,1
  
  [test at ohpc137pbsop-c001 ~]$

 [root at ohpc137pbsop-sms ~]# cat /etc/slurm/cgroup.conf
  ###
  #
  # Slurm cgroup support configuration file
  #
  # See man slurm.conf and man cgroup.conf for further
  # information on cgroup configuration parameters
  #--
  ConstrainCores=yes
  TaskAffinity=yes
  CgroupMountpoint=/cgroup
  CgroupAutomount=yes
  ConstrainRAMSpace=yes
  [root at ohpc137pbsop-sms ~]#
  
  [root at ohpc137pbsop-sms ~]# scontrol show node |grep Gres
   Gres=gpu:2
   Gres=gpu:2
  [root at ohpc137pbsop-sms ~]#

And I executed the following script.

  [test at ohpc137pbsop-sms ~]$ srun -l --gres=gpu:2 -n4 --accel-bind=v,g -l hostname
  0: ohpc137pbsop-c001
  2: ohpc137pbsop-c002
  1: ohpc137pbsop-c001
  3: ohpc137pbsop-c002
  [test at ohpc137pbsop-sms ~]$ srun -l --gres=gpu:2 -n4 --accel-bind=v -l hostname
  2: ohpc137pbsop-c002
  0: ohpc137pbsop-c001
  3: ohpc137pbsop-c002
  1: ohpc137pbsop-c001
  [test at ohpc137pbsop-sms ~]$

  Task binding information is not output.
  Is the verbose mode (of the accel-bind) not supported in this version(slurm 18.08.6)?

  The verbose mode of cpu-bind was confirmed as follows.
  [test at ohpc137pbsop-sms ~]$ srun -c1 --cpu-bind=v hostname
  cpu-bind=NULL - ohpc137pbsop-c001, task  0  0 [22822]: mask 0x1000001
  ohpc137pbsop-c001
  cpu-bind=NULL - ohpc137pbsop-c001, task  1  1 [22823]: mask 0x1000001
  ohpc137pbsop-c001
  [test at ohpc137pbsop-sms ~]$


More information about the slurm-users mailing list