[slurm-users] Verbose mode of the 'accel-bind' does not work.
Danny Rotscher
danny.rotscher at tu-dresden.de
Fri Mar 6 13:02:53 UTC 2020
Dear all,
we have the same problem on RHEL 7.7 and Slurm 19.05.5.
Can anybody of you help us to find a solution for that problem?
We now are using the parameter "SelectType=select/cons_res", do we may
need the parameter "SelectType=select/cons_tres" instead?
Kind regards,
Danny Rotscher
Am 27.11.19 um 07:47 schrieb Uemoto, Tomoki:
> Hi, all
>
> OS Version: RHEL 7.6
> SLURM Version: slurm 18.08.6
>
> I defined the gpu resource as follows:
>
> [test at ohpc137pbsop-c001 ~]$ scontrol show config |grep TaskPlugin
> TaskPlugin = task/cgroup
> TaskPluginParam = (null type)
> [test at ohpc137pbsop-c001 ~]$
>
> [test at ohpc137pbsop-c001 ~]$ grep Gres /etc/slurm/slurm.conf
> GresTypes=gpu
> NodeName=ohpc137pbsop-c001 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 Gres=gpu:2 State=IDLE
> NodeName=ohpc137pbsop-c002 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 Gres=gpu:2 State=IDLE
> [test at ohpc137pbsop-c001 ~]$
>
> [test at ohpc137pbsop-c001 ~]$ cat /etc/slurm/gres.conf
> Name=gpu File=/dev/tty0 Cores=0,1
> Name=gpu File=/dev/tty1 Cores=0,1
>
> [test at ohpc137pbsop-c001 ~]$
>
> [root at ohpc137pbsop-sms ~]# cat /etc/slurm/cgroup.conf
> ###
> #
> # Slurm cgroup support configuration file
> #
> # See man slurm.conf and man cgroup.conf for further
> # information on cgroup configuration parameters
> #--
> ConstrainCores=yes
> TaskAffinity=yes
> CgroupMountpoint=/cgroup
> CgroupAutomount=yes
> ConstrainRAMSpace=yes
> [root at ohpc137pbsop-sms ~]#
>
> [root at ohpc137pbsop-sms ~]# scontrol show node |grep Gres
> Gres=gpu:2
> Gres=gpu:2
> [root at ohpc137pbsop-sms ~]#
>
> And I executed the following script.
>
> [test at ohpc137pbsop-sms ~]$ srun -l --gres=gpu:2 -n4 --accel-bind=v,g -l hostname
> 0: ohpc137pbsop-c001
> 2: ohpc137pbsop-c002
> 1: ohpc137pbsop-c001
> 3: ohpc137pbsop-c002
> [test at ohpc137pbsop-sms ~]$ srun -l --gres=gpu:2 -n4 --accel-bind=v -l hostname
> 2: ohpc137pbsop-c002
> 0: ohpc137pbsop-c001
> 3: ohpc137pbsop-c002
> 1: ohpc137pbsop-c001
> [test at ohpc137pbsop-sms ~]$
>
> Task binding information is not output.
> Is the verbose mode (of the accel-bind) not supported in this version(slurm 18.08.6)?
>
> The verbose mode of cpu-bind was confirmed as follows.
> [test at ohpc137pbsop-sms ~]$ srun -c1 --cpu-bind=v hostname
> cpu-bind=NULL - ohpc137pbsop-c001, task 0 0 [22822]: mask 0x1000001
> ohpc137pbsop-c001
> cpu-bind=NULL - ohpc137pbsop-c001, task 1 1 [22823]: mask 0x1000001
> ohpc137pbsop-c001
> [test at ohpc137pbsop-sms ~]$
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Danny Rotscher
HPC-Support
Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)
01062 Dresden
Tel.: +49 351 463-35853
Fax : +49 351 463-37773
E-Mail: danny.rotscher at tu-dresden.de
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5202 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200306/8093ca20/attachment.bin>
More information about the slurm-users
mailing list