[slurm-users] Verbose mode of the 'accel-bind' does not work.

Danny Rotscher danny.rotscher at tu-dresden.de
Fri Mar 6 13:02:53 UTC 2020


Dear all,

we have the same problem on RHEL 7.7 and Slurm 19.05.5.
Can anybody of you help us to find a solution for that problem?

We now are using the parameter "SelectType=select/cons_res", do we may 
need the parameter "SelectType=select/cons_tres" instead?

Kind regards,
Danny Rotscher

Am 27.11.19 um 07:47 schrieb Uemoto, Tomoki:
> Hi, all
>
> OS Version: RHEL 7.6
> SLURM Version: slurm 18.08.6
>
> I defined the gpu resource as follows:
>
>    [test at ohpc137pbsop-c001 ~]$ scontrol show config |grep TaskPlugin
>    TaskPlugin              = task/cgroup
>    TaskPluginParam         = (null type)
>    [test at ohpc137pbsop-c001 ~]$
>    
>    [test at ohpc137pbsop-c001 ~]$ grep Gres /etc/slurm/slurm.conf
>    GresTypes=gpu
>    NodeName=ohpc137pbsop-c001 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 Gres=gpu:2 State=IDLE
>    NodeName=ohpc137pbsop-c002 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 Gres=gpu:2 State=IDLE
>    [test at ohpc137pbsop-c001 ~]$
>
>    [test at ohpc137pbsop-c001 ~]$ cat /etc/slurm/gres.conf
>    Name=gpu File=/dev/tty0 Cores=0,1
>    Name=gpu File=/dev/tty1 Cores=0,1
>    
>    [test at ohpc137pbsop-c001 ~]$
>
>  [root at ohpc137pbsop-sms ~]# cat /etc/slurm/cgroup.conf
>    ###
>    #
>    # Slurm cgroup support configuration file
>    #
>    # See man slurm.conf and man cgroup.conf for further
>    # information on cgroup configuration parameters
>    #--
>    ConstrainCores=yes
>    TaskAffinity=yes
>    CgroupMountpoint=/cgroup
>    CgroupAutomount=yes
>    ConstrainRAMSpace=yes
>    [root at ohpc137pbsop-sms ~]#
>    
>    [root at ohpc137pbsop-sms ~]# scontrol show node |grep Gres
>     Gres=gpu:2
>     Gres=gpu:2
>    [root at ohpc137pbsop-sms ~]#
>
> And I executed the following script.
>
>    [test at ohpc137pbsop-sms ~]$ srun -l --gres=gpu:2 -n4 --accel-bind=v,g -l hostname
>    0: ohpc137pbsop-c001
>    2: ohpc137pbsop-c002
>    1: ohpc137pbsop-c001
>    3: ohpc137pbsop-c002
>    [test at ohpc137pbsop-sms ~]$ srun -l --gres=gpu:2 -n4 --accel-bind=v -l hostname
>    2: ohpc137pbsop-c002
>    0: ohpc137pbsop-c001
>    3: ohpc137pbsop-c002
>    1: ohpc137pbsop-c001
>    [test at ohpc137pbsop-sms ~]$
>
>    Task binding information is not output.
>    Is the verbose mode (of the accel-bind) not supported in this version(slurm 18.08.6)?
>
>    The verbose mode of cpu-bind was confirmed as follows.
>    [test at ohpc137pbsop-sms ~]$ srun -c1 --cpu-bind=v hostname
>    cpu-bind=NULL - ohpc137pbsop-c001, task  0  0 [22822]: mask 0x1000001
>    ohpc137pbsop-c001
>    cpu-bind=NULL - ohpc137pbsop-c001, task  1  1 [22823]: mask 0x1000001
>    ohpc137pbsop-c001
>    [test at ohpc137pbsop-sms ~]$

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Danny Rotscher
HPC-Support

Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)
01062 Dresden
Tel.: +49 351 463-35853
Fax : +49 351 463-37773
E-Mail: danny.rotscher at tu-dresden.de
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5202 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200306/8093ca20/attachment.bin>


More information about the slurm-users mailing list