[slurm-users] Fwd: srun is not assigning task to a particular logical CPU using slurm

Animesh Kuity animesh2kuity at gmail.com
Sun Nov 12 22:39:17 MST 2017


Dear everyone,

Greetings!!!!

Answer to my post:

Actually slurmctld uses best-fit approach on the available resources on
each node. It does not obey our specified cpu map mask to assign task to
the logical CPUs.

I have added/Modified code to fulfil my requirement. Here is my experiment
result.

*$ srun -n 4 --cpu_bind=verbose,map_cpu:0,1,8,9 --distribution=block:block
--mem=1024 sleep 10*
cpu_bind=MAP  - clusterhost1, task  0  0 [3334]: mask 0x1 set
cpu_bind=MAP  - clusterhost1, task  1  1 [3335]: mask 0x2 set
cpu_bind=MAP  - clusterhost1, task  2  2 [3336]: mask 0x100 set
cpu_bind=MAP  - clusterhost1, task  3  3 [3337]: mask 0x200 set

*$ srun -n 16
--cpu_bind=verbose,map_cpu:0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23
--distribution=block:block --mem=1024 sleep 10*
cpu_bind=MAP  - clusterhost1, task  0  0 [3084]: mask 0x1 set
cpu_bind=MAP  - clusterhost1, task  1  1 [3085]: mask 0x2 set
cpu_bind=MAP  - clusterhost1, task  2  2 [3086]: mask 0x4 set
cpu_bind=MAP  - clusterhost1, task  3  3 [3087]: mask 0x8 set
cpu_bind=MAP  - clusterhost1, task 12 12 [3097]: mask 0x100000 set
cpu_bind=MAP  - clusterhost1, task 15 15 [3100]: mask 0x800000 set
cpu_bind=MAP  - clusterhost1, task  9  9 [3094]: mask 0x20000 set
cpu_bind=MAP  - clusterhost1, task 11 11 [3096]: mask 0x80000 set
cpu_bind=MAP  - clusterhost1, task  4  4 [3088]: mask 0x10 set
cpu_bind=MAP  - clusterhost1, task 10 10 [3095]: mask 0x40000 set
cpu_bind=MAP  - clusterhost1, task 13 13 [3098]: mask 0x200000 set
cpu_bind=MAP  - clusterhost1, task  5  5 [3089]: mask 0x20 set
cpu_bind=MAP  - clusterhost1, task  7  7 [3091]: mask 0x80 set
cpu_bind=MAP  - clusterhost1, task 14 14 [3099]: mask 0x400000 set
cpu_bind=MAP  - clusterhost1, task  8  8 [3093]: mask 0x10000 set
cpu_bind=MAP  - clusterhost1, task  6  6 [3090]: mask 0x40 set

*$ srun -n 16
--cpu_bind=verbose,map_cpu:8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31
--distribution=block:block --mem=1024 sleep 10*
cpu_bind=MAP  - clusterhost1, task  2  2 [3157]: mask 0x400 set
cpu_bind=MAP  - clusterhost1, task  0  0 [3155]: mask 0x100 set
cpu_bind=MAP  - clusterhost1, task  1  1 [3156]: mask 0x200 set
cpu_bind=MAP  - clusterhost1, task  3  3 [3158]: mask 0x800 set
cpu_bind=MAP  - clusterhost1, task  4  4 [3159]: mask 0x1000 set
cpu_bind=MAP  - clusterhost1, task  5  5 [3160]: mask 0x2000 set
cpu_bind=MAP  - clusterhost1, task  6  6 [3161]: mask 0x4000 set
cpu_bind=MAP  - clusterhost1, task 14 14 [3169]: mask 0x40000000 set
cpu_bind=MAP  - clusterhost1, task  7  7 [3162]: mask 0x8000 set
cpu_bind=MAP  - clusterhost1, task 13 13 [3168]: mask 0x20000000 set
cpu_bind=MAP  - clusterhost1, task 12 12 [3167]: mask 0x10000000 set
cpu_bind=MAP  - clusterhost1, task  8  8 [3163]: mask 0x1000000 set
cpu_bind=MAP  - clusterhost1, task  9  9 [3164]: mask 0x2000000 set
cpu_bind=MAP  - clusterhost1, task 15 15 [3170]: mask 0x80000000 set
cpu_bind=MAP  - clusterhost1, task 10 10 [3165]: mask 0x4000000 set
cpu_bind=MAP  - clusterhost1, task 11 11 [3166]: mask 0x8000000 set


*$ srun -n 16 --cpu_bind=verbose --mem=1024 sleep 10*cpu_bind=MASK -
clusterhost1, task  2  2 [3207]: mask 0x2 set
cpu_bind=MASK - clusterhost1, task 15 15 [3220]: mask 0x800000 set
cpu_bind=MASK - clusterhost1, task  3  3 [3208]: mask 0x20000 set
cpu_bind=MASK - clusterhost1, task  4  4 [3209]: mask 0x4 set
cpu_bind=MASK - clusterhost1, task 10 10 [3215]: mask 0x20 set
cpu_bind=MASK - clusterhost1, task 11 11 [3216]: mask 0x200000 set
cpu_bind=MASK - clusterhost1, task 12 12 [3217]: mask 0x40 set
cpu_bind=MASK - clusterhost1, task 13 13 [3218]: mask 0x400000 set
cpu_bind=MASK - clusterhost1, task 14 14 [3219]: mask 0x80 set
cpu_bind=MASK - clusterhost1, task  1  1 [3206]: mask 0x10000 set
cpu_bind=MASK - clusterhost1, task  5  5 [3210]: mask 0x40000 set
cpu_bind=MASK - clusterhost1, task  0  0 [3205]: mask 0x1 set
cpu_bind=MASK - clusterhost1, task  6  6 [3211]: mask 0x8 set
cpu_bind=MASK - clusterhost1, task  7  7 [3212]: mask 0x80000 set
cpu_bind=MASK - clusterhost1, task  9  9 [3214]: mask 0x100000 set
cpu_bind=MASK - clusterhost1, task  8  8 [3213]: mask 0x10 set

*$ srun -n 4 --cpu_bind=verbose --mem=1024 sleep 10*
cpu_bind=MASK - clusterhost1, task  3  3 [3266]: mask 0x20000 set
cpu_bind=MASK - clusterhost1, task  2  2 [3265]: mask 0x2 set
cpu_bind=MASK - clusterhost1, task  0  0 [3263]: mask 0x1 set
cpu_bind=MASK - clusterhost1, task  1  1 [3264]: mask 0x10000 set

On Fri, Oct 27, 2017 at 1:23 PM, Animesh Kuity <animesh2kuity at gmail.com>
wrote:

> Hi everyone,
>
> My objective: I want to assign few tasks to the logical CPUs belong to a
> particular socket(e.g., say socket 0) and at other time, I want to assign
> another set of tasks to the logical CPUs belongs to another socket (e.g.,
> say socket 0). In summary, I want to achieve task affinity to a particular
> logical CPU
>
> slurm version used: slurm 16.05.10-2
>
> slurm.conf to achieve task affinity:
>
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core
> TaskPlugin=task/affinity
> TaskPluginParam=sched
>
> Node used: Xeon processor; two sockets each having 8 cores with 2
> threads/core
>
> Processor layout(/proc/cpuinfo):
> processor physical id   core id
> 0,16            0            0
> 1,17            0            1
> 2,18            0             2
> 3,19             0            3
> 4,20             0            4
> 5,21             0            5
> 6,22             0            6
> 7,23             0            7
> 8,24             1            0
> 9,25             1            1
> 10,26           1            2
> 11,27           1            3
> 12,28           1            4
> 13,29           1            5
> 14,30           1            6
> 15,31           1            7
>
> Question: *I am unable to assign all the tasks to the particular logical
> CPUs belong to socket 0/ Socket 1 *
>
> The tasks are always assigning to the sockets 0 first irrespective of the
> specified map_cpu before going to socket 1
>
> *My observation:*
>
> *$ srun -n 8 --cpu_bind=verbose,map_cpu:0,1,2,3,16,17,18,19
> --distribution=block:block --mem=1024 sleep 100 &*
> [1] 14665
> cpu_bind=MASK - clusterhost1, task  0  0 [14697]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  1  1 [14698]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  4  4 [14701]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  2  2 [14699]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  3  3 [14700]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  5  5 [14702]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  6  6 [14703]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  7  7 [14704]: mask 0xf000f set
> *$ srun bash -c "cat /proc/self/status | grep Cpus_allowed_list"*
> Cpus_allowed_list:    4,20
>
>
> *$ srun -n 8 --cpu_bind=verbose,map_cpu:0,1,2,3,4,5,6,7
> --distribution=block:block --mem=1024 sleep 100 &*
> [1] 14814
> cpu_bind=MASK - clusterhost1, task  1  1 [14847]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  2  2 [14848]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  3  3 [14849]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  0  0 [14846]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  5  5 [14851]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  6  6 [14852]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  4  4 [14850]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  7  7 [14853]: mask 0xf000f set
> *$ srun bash -c "cat /proc/self/status | grep Cpus_allowed_list"*
> Cpus_allowed_list:    4,20
>
> *$ srun -n 20
> --cpu_bind=verbose,map_cpu:0,1,2,3,4,5,6,7,9,10,11,12,13,14,15,16,17,18,19
> --distribution=block:block --mem=1024 sleep 100 &*
> [1] 15688
> cpu_bind=MASK - clusterhost1, task  1  1 [15721]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task  2  2 [15722]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task  4  4 [15724]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task  5  5 [15725]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task  7  7 [15727]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task  0  0 [15720]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task  6  6 [15726]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task  3  3 [15723]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task 10 10 [15730]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task  8  8 [15728]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task  9  9 [15729]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task 11 11 [15731]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task 12 12 [15732]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task 14 14 [15734]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task 13 13 [15733]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task 15 15 [15735]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task 16 16 [15736]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task 17 17 [15737]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task 18 18 [15738]: mask 0x3ff03ff set
> cpu_bind=MASK - clusterhost1, task 19 19 [15739]: mask 0x3ff03ff set
> *$ srun bash -c "cat /proc/self/status | grep Cpus_allowed_list"*
> Cpus_allowed_list:    10,26
>
> *$ srun -n 8 --cpu_bind=verbose,map_cpu:8,9,10,11,24,25,26,27
> --distribution=block:block --mem=1024 sleep 100 &*
> [1] 16816
> cpu_bind=MASK - clusterhost1, task  1  1 [16850]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  4  4 [16853]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  3  3 [16852]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  2  2 [16851]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  0  0 [16849]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  6  6 [16855]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  5  5 [16854]: mask 0xf000f set
> cpu_bind=MASK - clusterhost1, task  7  7 [16856]: mask 0xf000f set
>
> *$ srun bash -c "cat /proc/self/status | grep Cpus_allowed_list"*
> Cpus_allowed_list:    4,20
>
> *$ srun  --nodes=1 --ntasks=32 --cpu_bind=cores,verbose --label cat
> /proc/self/status | grep Cpus_allowed_list*
> 00: cpu_bind=MASK - clusterhost1, task  0  0 [13955]: mask 0x10001 set
> 01: cpu_bind=MASK - clusterhost1, task  1  1 [13956]: mask 0x20002 set
> 04: cpu_bind=MASK - clusterhost1, task  4  4 [13959]: mask 0x100010 set
> 05: cpu_bind=MASK - clusterhost1, task  5  5 [13960]: mask 0x200020 set
> 06: cpu_bind=MASK - clusterhost1, task  6  6 [13961]: mask 0x400040 set
> 03: cpu_bind=MASK - clusterhost1, task  3  3 [13958]: mask 0x80008 set
> 02: cpu_bind=MASK - clusterhost1, task  2  2 [13957]: mask 0x40004 set
> 09: cpu_bind=MASK - clusterhost1, task  9  9 [13964]: mask 0x2000200 set
> 07: cpu_bind=MASK - clusterhost1, task  7  7 [13962]: mask 0x800080 set
> 10: cpu_bind=MASK - clusterhost1, task 10 10 [13965]: mask 0x4000400 set
> 11: cpu_bind=MASK - clusterhost1, task 11 11 [13966]: mask 0x8000800 set
> 14: cpu_bind=MASK - clusterhost1, task 14 14 [13969]: mask 0x40004000 set
> 15: cpu_bind=MASK - clusterhost1, task 15 15 [13970]: mask 0x80008000 set
> 12: cpu_bind=MASK - clusterhost1, task 12 12 [13967]: mask 0x10001000 set
> 13: cpu_bind=MASK - clusterhost1, task 13 13 [13968]: mask 0x20002000 set
> 08: cpu_bind=MASK - clusterhost1, task  8  8 [13963]: mask 0x1000100 set
> 17: cpu_bind=MASK - clusterhost1, task 17 17 [13972]: mask 0x20002 set
> 16: cpu_bind=MASK - clusterhost1, task 16 16 [13971]: mask 0x10001 set
> 20: cpu_bind=MASK - clusterhost1, task 20 20 [13975]: mask 0x100010 set
> 19: cpu_bind=MASK - clusterhost1, task 19 19 [13974]: mask 0x80008 set
> 18: cpu_bind=MASK - clusterhost1, task 18 18 [13973]: mask 0x40004 set
> 22: cpu_bind=MASK - clusterhost1, task 22 22 [13977]: mask 0x400040 set
> 21: cpu_bind=MASK - clusterhost1, task 21 21 [13976]: mask 0x200020 set
> 24: cpu_bind=MASK - clusterhost1, task 24 24 [13979]: mask 0x1000100 set
> 25: cpu_bind=MASK - clusterhost1, task 25 25 [13980]: mask 0x2000200 set
> 23: cpu_bind=MASK - clusterhost1, task 23 23 [13978]: mask 0x800080 set
> 26: cpu_bind=MASK - clusterhost1, task 26 26 [13981]: mask 0x4000400 set
> 30: cpu_bind=MASK - clusterhost1, task 30 30 [13985]: mask 0x40004000 set
> 31: cpu_bind=MASK - clusterhost1, task 31 31 [13986]: mask 0x80008000 set
> 28: cpu_bind=MASK - clusterhost1, task 28 28 [13983]: mask 0x10001000 set
> 29: cpu_bind=MASK - clusterhost1, task 29 29 [13984]: mask 0x20002000 set
> 27: cpu_bind=MASK - clusterhost1, task 27 27 [13982]: mask 0x8000800 set
> 03: Cpus_allowed_list:    3,19
> 04: Cpus_allowed_list:    4,20
> 01: Cpus_allowed_list:    1,17
> 06: Cpus_allowed_list:    6,22
> 00: Cpus_allowed_list:    0,16
> 02: Cpus_allowed_list:    2,18
> 05: Cpus_allowed_list:    5,21
> 09: Cpus_allowed_list:    9,25
> 10: Cpus_allowed_list:    10,26
> 14: Cpus_allowed_list:    14,30
> 11: Cpus_allowed_list:    11,27
> 15: Cpus_allowed_list:    15,31
> 12: Cpus_allowed_list:    12,28
> 13: Cpus_allowed_list:    13,29
> 17: Cpus_allowed_list:    1,17
> 07: Cpus_allowed_list:    7,23
> 16: Cpus_allowed_list:    0,16
> 08: Cpus_allowed_list:    8,24
> 20: Cpus_allowed_list:    4,20
> 19: Cpus_allowed_list:    3,19
> 18: Cpus_allowed_list:    2,18
> 21: Cpus_allowed_list:    5,21
> 22: Cpus_allowed_list:    6,22
> 24: Cpus_allowed_list:    8,24
> 23: Cpus_allowed_list:    7,23
> 26: Cpus_allowed_list:    10,26
> 30: Cpus_allowed_list:    14,30
> 31: Cpus_allowed_list:    15,31
> 25: Cpus_allowed_list:    9,25
> 28: Cpus_allowed_list:    12,28
> 29: Cpus_allowed_list:    13,29
> 27: Cpus_allowed_list:    11,27
>
>
> *Kindly help me to assign all the tasks to either socket.*
>
> Any kind of help will be appreciated.
>
> Thanks in advance.
>
> --
> Thanks & Regards,
> Animesh Kuity,
> Research Scholar,
> Computer Science department,
> IIT Roorkee
>



-- 
Thanks & Regards,
Animesh Kuity,
Research Scholar,
Computer Science department,
IIT Roorkee
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171113/86764006/attachment-0001.html>


More information about the slurm-users mailing list