[slurm-users] Example 16 of CPU Management User and Administrator Guide does not work.

Uemoto, Tomoki fj2770fj at aa.jp.fujitsu.com
Thu Nov 21 01:13:41 UTC 2019


Thank you. I now have a deeper understanding of this topic.

Looks like there is no problem without 'cpu_bind -v' mode.

    [test at ohpc137pbsop-sms ~]$ srun --nodes=1-1 --ntasks=6 --cpu-bind=cores cat /proc/self/status | grep Cpus_allowed_list
    Cpus_allowed_list:      0-1,12,24-25,36
    Cpus_allowed_list:      0-1,12,24-25,36
    Cpus_allowed_list:      0-1,12,24-25,36
    Cpus_allowed_list:      0-1,12,24-25,36
    Cpus_allowed_list:      0-1,12,24-25,36
    Cpus_allowed_list:      0-1,12,24-25,36
    [test at ohpc137pbsop-sms ~]$
    
Old bugzilla but falls under the follows.

    https://bugs.schedmd.com/show_bug.cgi?id=688
    * An indication should be made when a cpu binding is rejected, ideally with the 
    reason why. Perhaps only when --cpu_bind=verbose is used.

Therefore, I think this is a problem of 'cpu_bind -v' mode when 'TaskPlugin=task/cgroup'.

Even if the Nodename parameters(CoresPerSocket=12,ThreadsPerCore=2) were the same as lscpu, 
the result was the same.

[root at ohpc137pbsop-sms ~]# grep ^NodeName /etc/slurm/slurm.conf
NodeName=ohpc137pbsop-c001 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 State=UNKNOWN
NodeName=ohpc137pbsop-c002 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 State=UNKNOWN
NodeName=ohpc137pbsop-c003 NodeAddr=172.16.20.103 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 State=UNKNOWN

[test at ohpc137pbsop-sms ~]$ srun --nodes=1-1 --ntasks=6 --cpu-bind=v,cores cat /proc/self/status | grep Cpus_allowed_list
task/cgroup: task[4] not enough Core objects (3 < 6), disabling affinity
task/cgroup: task[2] not enough Core objects (3 < 6), disabling affinity
task/cgroup: task[3] not enough Core objects (3 < 6), disabling affinity
task/cgroup: task[0] not enough Core objects (3 < 6), disabling affinity
task/cgroup: task[5] not enough Core objects (3 < 6), disabling affinity
task/cgroup: task[1] not enough Core objects (3 < 6), disabling affinity
Cpus_allowed_list:      0-1,12,24-25,36
Cpus_allowed_list:      0-1,12,24-25,36
Cpus_allowed_list:      0-1,12,24-25,36
Cpus_allowed_list:      0-1,12,24-25,36
Cpus_allowed_list:      0-1,12,24-25,36
Cpus_allowed_list:      0-1,12,24-25,36
[test at ohpc137pbsop-sms ~]$

Regards,
Tomo


-----Original Message-----
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Mark Hahn
Sent: Thursday, November 21, 2019 12:20 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Example 16 of CPU Management User and Administrator Guide does not work.

>  task/cgroup: task[1] not enough Core objects (4 < 6), disabling 
> affinity What does this message mean?
...
| [root at ohpc137pbsop-sms ~]# grep ^NodeName /etc/slurm/slurm.conf
| NodeName=ohpc137pbsop-c001 Sockets=2 CoresPerSocket=4 ThreadsPerCore=1 
| Procs=8 State=UNKNOWN

could you try CoresPerSocket=12 here, to match the provided lscpu?
(normally also ThreadsPerCore=2, since HT is enabled.)

regards,
--
Mark Hahn | SHARCnet Sysadmin | hahn at sharcnet.ca | http://www.sharcnet.ca
           | McMaster RHPCS    | hahn at mcmaster.ca | 905 525 9140 x24687
           | Compute/Calcul Canada                | http://www.computecanada.ca




More information about the slurm-users mailing list