[slurm-users] CPUSpecList confusion
Paul Raines
raines at nmr.mgh.harvard.edu
Wed Dec 14 17:11:48 UTC 2022
Ugh. Guess I cannot count. The mapping on that last node DOES work with
the "alternating" scheme where we have
0 0
1 2
2 4
3 6
4 8
5 10
6 12
7 14
8 16
9 18
10 20
11 22
12 1
13 3
14 5
15 7
16 9
17 11
18 13
19 15
20 17
21 19
22 21
23 23
so CPU_IDs=8-11,20-23 does correspond to cgroup 16-23
Using the script
cd /sys/fs/cgroup/cpuset/slurm
for d in $(find -name 'job*') ; do
j=$(echo $d | cut -d_ -f3)
echo === $j
scontrol -d show job $j | grep CPU_ID | cut -d' ' -f7
cat $d/cpuset.effective_cpus
done
=== 1967214
CPU_IDs=8-11,20-23
16-23
=== 1960208
CPU_IDs=12-19
1,3,5,7,9,11,13,15
=== 1966815
CPU_IDs=0
0
=== 1966821
CPU_IDs=6
12
=== 1966818
CPU_IDs=3
6
=== 1966816
CPU_IDs=1
2
=== 1966822
CPU_IDs=7
14
=== 1966820
CPU_IDs=5
10
=== 1966819
CPU_IDs=4
8
=== 1966817
CPU_IDs=2
4
On all my nodes I see just two schemes. The alternating odd/even one
above and one that is does not alternate like on this box with
CPUs=32 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=1
=== 1966495
CPU_IDs=0-2
0-2
=== 1966498
CPU_IDs=10-12
10-12
=== 1966502
CPU_IDs=26-28
26-28
=== 1960064
CPU_IDs=7-9,13-25
7-9,13-25
=== 1954480
CPU_IDs=3-6
3-6
On Wed, 14 Dec 2022 9:42am, Paul Raines wrote:
>
> Yes, I see that on some of my other machines too. So apicid is definitely
> not what SLURM is using but somehow just lines up that way on this one
> machine I have.
>
> I think the issue is cgroups counts starting at 0 all the cores on the first
> socket, then all the cores on the second socket. But SLURM (on a two socket
> box) counts 0 as the first core on the first socket, 1 as the first core on
> the second socket, 2 as the second core on the first socket,
> 3 as the second core on the second socket, and so on. (Looks like I am
> wrong: see below)
>
> Why slurm does this instead of just using the assignments cgroups uses
> I have no idea. Hopefully one of the SLURM developers reads this
> and can explain
>
> Looking at another SLURM node I have (where cgroups v1 is still in use
> and HT turned off) with definition
>
> CPUs=24 Boards=1 SocketsPerBoard=2 CoresPerSocket=12 ThreadsPerCore=1
>
> I find
>
> [root at r440-17 ~]# egrep '^(apicid|proc)' /proc/cpuinfo | tail -4
> processor : 22
> apicid : 22
> processor : 23
> apicid : 54
>
> So apicid's are NOT going to work
>
> # scontrol -d show job 1966817 | grep CPU_ID
> Nodes=r17 CPU_IDs=2 Mem=16384 GRES=
> # cat /sys/fs/cgroup/cpuset/slurm/uid_3776056/job_1966817/cpuset.cpus
> 4
>
> If Slurm has '2' this should be the second core on the first socket so should
> be '1' in cgroups, but it is 4 as we see above which is the fifth core on the
> first socket. So I guess I was wrong above.
>
> But in /proc/cpuinfo the apicid for processor 4 is 2!!! So is apicid
> right after all? Nope, on the same machine I have
>
> # scontrol -d show job 1960208 | grep CPU_ID
> Nodes=r17 CPU_IDs=12-19 Mem=51200 GRES=
> # cat /sys/fs/cgroup/cpuset/slurm/uid_5164679/job_1960208/cpuset.cpus
> 1,3,5,7,9,11,13,15
>
> and in /proc/cpuinfo the apcid for processor 12 is 16
>
> # scontrol -d show job 1967214 | grep CPU_ID
> Nodes=r17 CPU_IDs=8-11,20-23 Mem=51200 GRES=
> # cat /sys/fs/cgroup/cpuset/slurm/uid_5164679/job_1967214/cpuset.cpus
> 16-23
>
> I am totally lost now. Seems totally random. SLURM devs? Any insight?
>
>
> -- Paul Raines (http://help.nmr.mgh.harvard.edu)
>
>
>
> On Wed, 14 Dec 2022 1:33am, Marcus Wagner wrote:
>
>> Hi Paul,
>>
>> sorry to say, but that has to be some coincidence on your system. I've
>> never seen Slurm reporting using corenumbers, which are higher than the
>> total number of cores.
>>
>> I have e.g. a intel Platinum 8160 here. 24 Cores per Socket, no
>> HyperThreading activated.
>> Yet here the last lines of /proc/cpuinfo:
>>
>> processor : 43
>> apicid : 114
>> processor : 44
>> apicid : 116
>> processor : 45
>> apicid : 118
>> processor : 46
>> apicid : 120
>> processor : 47
>> apicid : 122
>>
>> Never seen Slurm reporting corenumbers for a job > 96
>> Nonetheless, I agree, the cores reported by Slurm mostly have nothing to
>> do with the cores reported e.g. by cgroups.
>> Since Slurm creates the cgroups, I wonder, why they report some kind of
>> abstract coreid, because they should know, which cores are used, as they
>> create the cgroups for the jobs.
>>
>> Best
>> Marcus
>>
>> Am 13.12.2022 um 16:39 schrieb Paul Raines:
>>>
>>> Yes, looks like SLURM is using the apicid that is in /proc/cpuinfo
>>> The first 14 cpus in /proc/cpu (procs 0-13) have apicid
>>> 0,2,4,6,8,10,12,14,16,20,22,24,26,28 in /proc/cpuinfo
>>>
>>> So after setting CpuSpecList=0,2,4,6,8,10,12,14,16,18,20,22,24,26
>>> in slurm.conf it appears to be doing what I want
>>>
>>> $ echo $SLURM_JOB_ID
>>> 9
>>> $ grep -i ^cpu /proc/self/status
>>> Cpus_allowed: 000f0000,000f0000
>>> Cpus_allowed_list: 16-19,48-51
>>> $ scontrol -d show job 9 | grep CPU_ID
>>> Nodes=larkin CPU_IDs=32-39 Mem=25600 GRES=
>>>
>>> apcid=32 is processor=16 and apcid=33 is processor=48 in /proc/cpuinfo
>>>
>>> Thanks
>>>
>>> -- Paul Raines (http://help.nmr.mgh.harvard.edu)
>>>
>>>
>>>
>>> On Tue, 13 Dec 2022 9:52am, Sean Maxwell wrote:
>>>
>>>> External Email - Use Caution
>>>> In the slurm.conf manual they state the CpuSpecList ids are "abstract",
>>>> and
>>>> in the CPU management docs they enforce the notion that the abstract
>>>> Slurm
>>>> IDs are not related to the Linux hardware IDs, so that is probably the
>>>> source of the behavior. I unfortunately don't have more information.
>>>>
>>>> On Tue, Dec 13, 2022 at 9:45 AM Paul Raines
>>>> <raines at nmr.mgh.harvard.edu>
>>>> wrote:
>>>>
>>>>>
>>>>> Hmm. Actually looks like confusion between CPU IDs on the system
>>>>> and what SLURM thinks the IDs are
>>>>>
>>>>> # scontrol -d show job 8
>>>>> ...
>>>>> Nodes=foobar CPU_IDs=14-21 Mem=25600 GRES=
>>>>> ...
>>>>>
>>>>> # cat
>>>>> /sys/fs/cgroup/system.slice/slurmstepd.scope/job_8/cpuset.cpus.effective
>>>>> 7-10,39-42
>>>>>
>>>>>
>>>>> -- Paul Raines
>>>>> (http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu)
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 13 Dec 2022 9:40am, Paul Raines wrote:
>>>>>
>>>>> >
>>>>> > Oh but that does explain the CfgTRES=cpu=14. With the CpuSpecList
>>>>> > below and SlurmdOffSpec I do get CfgTRES=cpu=50 so that makes sense.
>>>>> >
>>>>> > The issue remains that thought the number of cpus in CpuSpecList
>>>>> > is taken into account, the exact IDs seem to be ignored.
>>>>> >
>>>>> >
>>>>> > -- Paul Raines
>>>>> > (http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu)
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, 13 Dec 2022 9:34am, Paul Raines wrote:
>>>>> >
>>>>> >>
>>>>> >> I have tried it both ways with the same result. The assigned CPUs
>>>>> >> will be both in and out of the range given to CpuSpecList
>>>>> >>
>>>>> >> I tried setting using commas instead of ranges so used
>>>>> >>
>>>>> >> CpuSpecList=0,1,2,3,4,5,6,7,8,9,10,11,12,13
>>>>> >>
>>>>> >> But still does not work
>>>>> >>
>>>>> >> $ srun -p basic -N 1 --ntasks-per-node=1 --mem=25G \
>>>>> >> --time=10:00:00 --cpus-per-task=8 --pty /bin/bash
>>>>> >> $ grep -i ^cpu /proc/self/status
>>>>> >> Cpus_allowed: 00000780,00000780
>>>>> >> Cpus_allowed_list: 7-10,39-42
>>>>> >>
>>>>> >>
>>>>> >> -- Paul Raines
>>>>> >> (http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu)
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Mon, 12 Dec 2022 10:21am, Sean Maxwell wrote:
>>>>> >>
>>>>> >>> Hi Paul,
>>>>> >>>
>>>>> >>> Nodename=foobar \
>>>>> >>>> CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16
>>>>> >>>> ThreadsPerCore=2
>>>>> >>>> \
>>>>> >>>> RealMemory=256312 MemSpecLimit=32768 CpuSpecList=14-63 \
>>>>> >>>> TmpDisk=6000000 Gres=gpu:nvidia_rtx_a6000:1
>>>>> >>>>
>>>>> >>>> The slurm.conf also has:
>>>>> >>>>
>>>>> >>>> ProctrackType=proctrack/cgroup
>>>>> >>>> TaskPlugin=task/affinity,task/cgroup
>>>>> >>>> TaskPluginParam=Cores,*SlurmdOf**fSpec*,Verbose
>>>>> >>>>
>>>>> >>>
>>>>> >>> Doesn't setting SlurmdOffSpec tell Slurmd that is should NOT use
>>>>> >>> the
>>>>> >>> CPUs
>>>>> >>> in the spec list? (
>>>>> >>>
>>>>> >>> https://secure-web.cisco.com/1V9Fskh4lCAx_XrdlCr8o1EtnePELf-1YK4TerT47ktLxy3fO9FaIpaGXVA8ODhMAdhmXSqToQstwAilA71r7z1Q4jDqPSKEsJQNUhJYYRtxFnZIO49QxsYrVo9c3ExH89cIk_t7H5dtGEjpme2LFKm23Z52yK-xZ3fEl_LyK61uCzkas6GKykzPCPyoNXaFgs32Ct2tDIVL8vI6JW1_-1uQ9gUyWmm24xJoBxLEui7tSTVwMtiVRu8C7pU1nJ8qr6ghBlxrqx-wQiRP4XBCjhPARHa2KBqkUogjEVRAe3WdAbbYBxtXeVsWjqNGmjSVA/https%3A%2F%2Fslurm.schedmd.com%2Fslurm.conf.html%23OPT_SlurmdOffSpec)
>>>>> >>> In this case, I believe it uses what is left, which is the 0-13.
>>>>> >>> We
>>>>> are
>>>>> >>> just starting to work on this ourselves, and were looking at
>>>>> >>> this
>>>>> >>> setting.
>>>>> >>>
>>>>> >>> Best,
>>>>> >>>
>>>>> >>> -Sean
>>>>> >>>
>>>>> >>
>>>>> >
>>>>> The information in this e-mail is intended only for the person to whom
>>>>> it
>>>>> is addressed. If you believe this e-mail was sent to you in error and
>>>>> the
>>>>> e-mail contains patient information, please contact the Mass General
>>>>> Brigham Compliance HelpLine at
>>>>> https://secure-web.cisco.com/11OmVChs0jRoe-4AH2iRxvEdMN0dxZcFsunG07PJ0sXxdW7tj7-BUiDwEEi3gkqOms_qFRdQbCLHJQW0jD6cG8-griFmte8mXIoPZSDzIE8dHcew9yMCpQxJnYVVs8mK5aB-9o4ospPlPqxo3FA0LN8gpJSrsBKOxr5m7T3Jd7FY04zJnehrYc0FQwfWAPJx523fZTqVTTmwZgdEFZAQtURZ8hPxlohSzsh7d13L7byOVUmxAxzolzDTvRSH9l1gjMm-RjtdW95eYkgPlRoM3nJ0WCledYAp5NA3kUGNhsc5uNDp3lWIzS7gZGIMfTyg9/https%3A%2F%2Fwww.massgeneralbrigham.org%2Fcomplianceline
>>>>> <
>>>>> https://secure-web.cisco.com/11OmVChs0jRoe-4AH2iRxvEdMN0dxZcFsunG07PJ0sXxdW7tj7-BUiDwEEi3gkqOms_qFRdQbCLHJQW0jD6cG8-griFmte8mXIoPZSDzIE8dHcew9yMCpQxJnYVVs8mK5aB-9o4ospPlPqxo3FA0LN8gpJSrsBKOxr5m7T3Jd7FY04zJnehrYc0FQwfWAPJx523fZTqVTTmwZgdEFZAQtURZ8hPxlohSzsh7d13L7byOVUmxAxzolzDTvRSH9l1gjMm-RjtdW95eYkgPlRoM3nJ0WCledYAp5NA3kUGNhsc5uNDp3lWIzS7gZGIMfTyg9/https%3A%2F%2Fwww.massgeneralbrigham.org%2Fcomplianceline>
>>>>> .
>>>>> Please note that this e-mail is not secure (encrypted). If you do not
>>>>> wish to continue communication over unencrypted e-mail, please notify
>>>>> the
>>>>> sender of this message immediately. Continuing to send or respond to
>>>>> e-mail after receiving this message means you understand and accept
>>>>> this
>>>>> risk and wish to continue to communicate over unencrypted e-mail.
>>>>>
>>>>>
>>> The information in this e-mail is intended only for the person to whom
>>> it
>>> is addressed. If you believe this e-mail was sent to you in error and
>>> the
>>> e-mail contains patient information, please contact the Mass General
>>> Brigham Compliance HelpLine at
>>> https://www.massgeneralbrigham.org/complianceline
>>> <https://www.massgeneralbrigham.org/complianceline> .
>>> Please note that this e-mail is not secure (encrypted). If you do not
>>> wish to continue communication over unencrypted e-mail, please notify
>>> the
>>> sender of this message immediately. Continuing to send or respond to
>>> e-mail after receiving this message means you understand and accept this
>>> risk and wish to continue to communicate over unencrypted e-mail.
>>>
>>
>> --
>> Dipl.-Inf. Marcus Wagner
>>
>> IT Center
>> Gruppe: Server, Storage, HPC
>> Abteilung: Systeme und Betrieb
>> RWTH Aachen University
>> Seffenter Weg 23
>> 52074 Aachen
>> Tel: +49 241 80-24383
>> Fax: +49 241 80-624383
>> wagner at itc.rwth-aachen.de
>> www.itc.rwth-aachen.de
>>
>> Social Media Kanäle des IT Centers:
>> https://blog.rwth-aachen.de/itc/
>> https://www.facebook.com/itcenterrwth
>> https://www.linkedin.com/company/itcenterrwth
>> https://twitter.com/ITCenterRWTH
>> https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ
>>
>
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.
More information about the slurm-users
mailing list