[slurm-users] Example 16 of CPU Management User and Administrator Guide does not work.
Uemoto, Tomoki
fj2770fj at aa.jp.fujitsu.com
Thu Nov 21 01:13:41 UTC 2019
Thank you. I now have a deeper understanding of this topic.
Looks like there is no problem without 'cpu_bind -v' mode.
[test at ohpc137pbsop-sms ~]$ srun --nodes=1-1 --ntasks=6 --cpu-bind=cores cat /proc/self/status | grep Cpus_allowed_list
Cpus_allowed_list: 0-1,12,24-25,36
Cpus_allowed_list: 0-1,12,24-25,36
Cpus_allowed_list: 0-1,12,24-25,36
Cpus_allowed_list: 0-1,12,24-25,36
Cpus_allowed_list: 0-1,12,24-25,36
Cpus_allowed_list: 0-1,12,24-25,36
[test at ohpc137pbsop-sms ~]$
Old bugzilla but falls under the follows.
https://bugs.schedmd.com/show_bug.cgi?id=688
* An indication should be made when a cpu binding is rejected, ideally with the
reason why. Perhaps only when --cpu_bind=verbose is used.
Therefore, I think this is a problem of 'cpu_bind -v' mode when 'TaskPlugin=task/cgroup'.
Even if the Nodename parameters(CoresPerSocket=12,ThreadsPerCore=2) were the same as lscpu,
the result was the same.
[root at ohpc137pbsop-sms ~]# grep ^NodeName /etc/slurm/slurm.conf
NodeName=ohpc137pbsop-c001 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 State=UNKNOWN
NodeName=ohpc137pbsop-c002 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 State=UNKNOWN
NodeName=ohpc137pbsop-c003 NodeAddr=172.16.20.103 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 State=UNKNOWN
[test at ohpc137pbsop-sms ~]$ srun --nodes=1-1 --ntasks=6 --cpu-bind=v,cores cat /proc/self/status | grep Cpus_allowed_list
task/cgroup: task[4] not enough Core objects (3 < 6), disabling affinity
task/cgroup: task[2] not enough Core objects (3 < 6), disabling affinity
task/cgroup: task[3] not enough Core objects (3 < 6), disabling affinity
task/cgroup: task[0] not enough Core objects (3 < 6), disabling affinity
task/cgroup: task[5] not enough Core objects (3 < 6), disabling affinity
task/cgroup: task[1] not enough Core objects (3 < 6), disabling affinity
Cpus_allowed_list: 0-1,12,24-25,36
Cpus_allowed_list: 0-1,12,24-25,36
Cpus_allowed_list: 0-1,12,24-25,36
Cpus_allowed_list: 0-1,12,24-25,36
Cpus_allowed_list: 0-1,12,24-25,36
Cpus_allowed_list: 0-1,12,24-25,36
[test at ohpc137pbsop-sms ~]$
Regards,
Tomo
-----Original Message-----
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Mark Hahn
Sent: Thursday, November 21, 2019 12:20 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Example 16 of CPU Management User and Administrator Guide does not work.
> task/cgroup: task[1] not enough Core objects (4 < 6), disabling
> affinity What does this message mean?
...
| [root at ohpc137pbsop-sms ~]# grep ^NodeName /etc/slurm/slurm.conf
| NodeName=ohpc137pbsop-c001 Sockets=2 CoresPerSocket=4 ThreadsPerCore=1
| Procs=8 State=UNKNOWN
could you try CoresPerSocket=12 here, to match the provided lscpu?
(normally also ThreadsPerCore=2, since HT is enabled.)
regards,
--
Mark Hahn | SHARCnet Sysadmin | hahn at sharcnet.ca | http://www.sharcnet.ca
| McMaster RHPCS | hahn at mcmaster.ca | 905 525 9140 x24687
| Compute/Calcul Canada | http://www.computecanada.ca
More information about the slurm-users
mailing list