[slurm-users] CPUSpecList confusion
Paul Raines
raines at nmr.mgh.harvard.edu
Wed Dec 14 14:42:23 UTC 2022
Yes, I see that on some of my other machines too. So apicid is definitely
not what SLURM is using but somehow just lines up that way on this one
machine I have.
I think the issue is cgroups counts starting at 0 all the cores on the
first socket, then all the cores on the second socket. But SLURM (on a
two socket box) counts 0 as the first core on the first socket, 1 as the
first core on the second socket, 2 as the second core on the first socket,
3 as the second core on the second socket, and so on. (Looks like I am
wrong: see below)
Why slurm does this instead of just using the assignments cgroups uses
I have no idea. Hopefully one of the SLURM developers reads this
and can explain
Looking at another SLURM node I have (where cgroups v1 is still in use
and HT turned off) with definition
CPUs=24 Boards=1 SocketsPerBoard=2 CoresPerSocket=12 ThreadsPerCore=1
I find
[root at r440-17 ~]# egrep '^(apicid|proc)' /proc/cpuinfo | tail -4
processor : 22
apicid : 22
processor : 23
apicid : 54
So apicid's are NOT going to work
# scontrol -d show job 1966817 | grep CPU_ID
Nodes=r17 CPU_IDs=2 Mem=16384 GRES=
# cat /sys/fs/cgroup/cpuset/slurm/uid_3776056/job_1966817/cpuset.cpus
4
If Slurm has '2' this should be the second core on the first socket so
should be '1' in cgroups, but it is 4 as we see above which is the fifth
core on the first socket. So I guess I was wrong above.
But in /proc/cpuinfo the apicid for processor 4 is 2!!! So is apicid
right after all? Nope, on the same machine I have
# scontrol -d show job 1960208 | grep CPU_ID
Nodes=r17 CPU_IDs=12-19 Mem=51200 GRES=
# cat /sys/fs/cgroup/cpuset/slurm/uid_5164679/job_1960208/cpuset.cpus
1,3,5,7,9,11,13,15
and in /proc/cpuinfo the apcid for processor 12 is 16
# scontrol -d show job 1967214 | grep CPU_ID
Nodes=r17 CPU_IDs=8-11,20-23 Mem=51200 GRES=
# cat /sys/fs/cgroup/cpuset/slurm/uid_5164679/job_1967214/cpuset.cpus
16-23
I am totally lost now. Seems totally random. SLURM devs? Any insight?
-- Paul Raines (http://help.nmr.mgh.harvard.edu)
On Wed, 14 Dec 2022 1:33am, Marcus Wagner wrote:
> Hi Paul,
>
> sorry to say, but that has to be some coincidence on your system. I've never
> seen Slurm reporting using corenumbers, which are higher than the total
> number of cores.
>
> I have e.g. a intel Platinum 8160 here. 24 Cores per Socket, no
> HyperThreading activated.
> Yet here the last lines of /proc/cpuinfo:
>
> processor : 43
> apicid : 114
> processor : 44
> apicid : 116
> processor : 45
> apicid : 118
> processor : 46
> apicid : 120
> processor : 47
> apicid : 122
>
> Never seen Slurm reporting corenumbers for a job > 96
> Nonetheless, I agree, the cores reported by Slurm mostly have nothing to do
> with the cores reported e.g. by cgroups.
> Since Slurm creates the cgroups, I wonder, why they report some kind of
> abstract coreid, because they should know, which cores are used, as they
> create the cgroups for the jobs.
>
> Best
> Marcus
>
> Am 13.12.2022 um 16:39 schrieb Paul Raines:
>>
>> Yes, looks like SLURM is using the apicid that is in /proc/cpuinfo
>> The first 14 cpus in /proc/cpu (procs 0-13) have apicid
>> 0,2,4,6,8,10,12,14,16,20,22,24,26,28 in /proc/cpuinfo
>>
>> So after setting CpuSpecList=0,2,4,6,8,10,12,14,16,18,20,22,24,26
>> in slurm.conf it appears to be doing what I want
>>
>> $ echo $SLURM_JOB_ID
>> 9
>> $ grep -i ^cpu /proc/self/status
>> Cpus_allowed: 000f0000,000f0000
>> Cpus_allowed_list: 16-19,48-51
>> $ scontrol -d show job 9 | grep CPU_ID
>> Nodes=larkin CPU_IDs=32-39 Mem=25600 GRES=
>>
>> apcid=32 is processor=16 and apcid=33 is processor=48 in /proc/cpuinfo
>>
>> Thanks
>>
>> -- Paul Raines (http://help.nmr.mgh.harvard.edu)
>>
>>
>>
>> On Tue, 13 Dec 2022 9:52am, Sean Maxwell wrote:
>>
>>> External Email - Use Caution
>>> In the slurm.conf manual they state the CpuSpecList ids are "abstract",
>>> and
>>> in the CPU management docs they enforce the notion that the abstract
>>> Slurm
>>> IDs are not related to the Linux hardware IDs, so that is probably the
>>> source of the behavior. I unfortunately don't have more information.
>>>
>>> On Tue, Dec 13, 2022 at 9:45 AM Paul Raines <raines at nmr.mgh.harvard.edu>
>>> wrote:
>>>
>>>>
>>>> Hmm. Actually looks like confusion between CPU IDs on the system
>>>> and what SLURM thinks the IDs are
>>>>
>>>> # scontrol -d show job 8
>>>> ...
>>>> Nodes=foobar CPU_IDs=14-21 Mem=25600 GRES=
>>>> ...
>>>>
>>>> # cat
>>>> /sys/fs/cgroup/system.slice/slurmstepd.scope/job_8/cpuset.cpus.effective
>>>> 7-10,39-42
>>>>
>>>>
>>>> -- Paul Raines
>>>> (http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu)
>>>>
>>>>
>>>>
>>>> On Tue, 13 Dec 2022 9:40am, Paul Raines wrote:
>>>>
>>>> >
>>>> > Oh but that does explain the CfgTRES=cpu=14. With the CpuSpecList
>>>> > below and SlurmdOffSpec I do get CfgTRES=cpu=50 so that makes sense.
>>>> >
>>>> > The issue remains that thought the number of cpus in CpuSpecList
>>>> > is taken into account, the exact IDs seem to be ignored.
>>>> >
>>>> >
>>>> > -- Paul Raines
>>>> > (http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu)
>>>> >
>>>> >
>>>> >
>>>> > On Tue, 13 Dec 2022 9:34am, Paul Raines wrote:
>>>> >
>>>> >>
>>>> >> I have tried it both ways with the same result. The assigned CPUs
>>>> >> will be both in and out of the range given to CpuSpecList
>>>> >>
>>>> >> I tried setting using commas instead of ranges so used
>>>> >>
>>>> >> CpuSpecList=0,1,2,3,4,5,6,7,8,9,10,11,12,13
>>>> >>
>>>> >> But still does not work
>>>> >>
>>>> >> $ srun -p basic -N 1 --ntasks-per-node=1 --mem=25G \
>>>> >> --time=10:00:00 --cpus-per-task=8 --pty /bin/bash
>>>> >> $ grep -i ^cpu /proc/self/status
>>>> >> Cpus_allowed: 00000780,00000780
>>>> >> Cpus_allowed_list: 7-10,39-42
>>>> >>
>>>> >>
>>>> >> -- Paul Raines
>>>> >> (http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu)
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Mon, 12 Dec 2022 10:21am, Sean Maxwell wrote:
>>>> >>
>>>> >>> Hi Paul,
>>>> >>>
>>>> >>> Nodename=foobar \
>>>> >>>> CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16
>>>> >>>> ThreadsPerCore=2
>>>> >>>> \
>>>> >>>> RealMemory=256312 MemSpecLimit=32768 CpuSpecList=14-63 \
>>>> >>>> TmpDisk=6000000 Gres=gpu:nvidia_rtx_a6000:1
>>>> >>>>
>>>> >>>> The slurm.conf also has:
>>>> >>>>
>>>> >>>> ProctrackType=proctrack/cgroup
>>>> >>>> TaskPlugin=task/affinity,task/cgroup
>>>> >>>> TaskPluginParam=Cores,*SlurmdOf**fSpec*,Verbose
>>>> >>>>
>>>> >>>
>>>> >>> Doesn't setting SlurmdOffSpec tell Slurmd that is should NOT use
>>>> >>> the
>>>> >>> CPUs
>>>> >>> in the spec list? (
>>>> >>>
>>>> >>> https://secure-web.cisco.com/1V9Fskh4lCAx_XrdlCr8o1EtnePELf-1YK4TerT47ktLxy3fO9FaIpaGXVA8ODhMAdhmXSqToQstwAilA71r7z1Q4jDqPSKEsJQNUhJYYRtxFnZIO49QxsYrVo9c3ExH89cIk_t7H5dtGEjpme2LFKm23Z52yK-xZ3fEl_LyK61uCzkas6GKykzPCPyoNXaFgs32Ct2tDIVL8vI6JW1_-1uQ9gUyWmm24xJoBxLEui7tSTVwMtiVRu8C7pU1nJ8qr6ghBlxrqx-wQiRP4XBCjhPARHa2KBqkUogjEVRAe3WdAbbYBxtXeVsWjqNGmjSVA/https%3A%2F%2Fslurm.schedmd.com%2Fslurm.conf.html%23OPT_SlurmdOffSpec)
>>>> >>> In this case, I believe it uses what is left, which is the 0-13.
>>>> >>> We
>>>> are
>>>> >>> just starting to work on this ourselves, and were looking at this
>>>> >>> setting.
>>>> >>>
>>>> >>> Best,
>>>> >>>
>>>> >>> -Sean
>>>> >>>
>>>> >>
>>>> >
>>>> The information in this e-mail is intended only for the person to whom
>>>> it
>>>> is addressed. If you believe this e-mail was sent to you in error and
>>>> the
>>>> e-mail contains patient information, please contact the Mass General
>>>> Brigham Compliance HelpLine at
>>>> https://secure-web.cisco.com/11OmVChs0jRoe-4AH2iRxvEdMN0dxZcFsunG07PJ0sXxdW7tj7-BUiDwEEi3gkqOms_qFRdQbCLHJQW0jD6cG8-griFmte8mXIoPZSDzIE8dHcew9yMCpQxJnYVVs8mK5aB-9o4ospPlPqxo3FA0LN8gpJSrsBKOxr5m7T3Jd7FY04zJnehrYc0FQwfWAPJx523fZTqVTTmwZgdEFZAQtURZ8hPxlohSzsh7d13L7byOVUmxAxzolzDTvRSH9l1gjMm-RjtdW95eYkgPlRoM3nJ0WCledYAp5NA3kUGNhsc5uNDp3lWIzS7gZGIMfTyg9/https%3A%2F%2Fwww.massgeneralbrigham.org%2Fcomplianceline
>>>> <
>>>> https://secure-web.cisco.com/11OmVChs0jRoe-4AH2iRxvEdMN0dxZcFsunG07PJ0sXxdW7tj7-BUiDwEEi3gkqOms_qFRdQbCLHJQW0jD6cG8-griFmte8mXIoPZSDzIE8dHcew9yMCpQxJnYVVs8mK5aB-9o4ospPlPqxo3FA0LN8gpJSrsBKOxr5m7T3Jd7FY04zJnehrYc0FQwfWAPJx523fZTqVTTmwZgdEFZAQtURZ8hPxlohSzsh7d13L7byOVUmxAxzolzDTvRSH9l1gjMm-RjtdW95eYkgPlRoM3nJ0WCledYAp5NA3kUGNhsc5uNDp3lWIzS7gZGIMfTyg9/https%3A%2F%2Fwww.massgeneralbrigham.org%2Fcomplianceline>
>>>> .
>>>> Please note that this e-mail is not secure (encrypted). If you do not
>>>> wish to continue communication over unencrypted e-mail, please notify
>>>> the
>>>> sender of this message immediately. Continuing to send or respond to
>>>> e-mail after receiving this message means you understand and accept this
>>>> risk and wish to continue to communicate over unencrypted e-mail.
>>>>
>>>>
>> The information in this e-mail is intended only for the person to whom it
>> is addressed. If you believe this e-mail was sent to you in error and the
>> e-mail contains patient information, please contact the Mass General
>> Brigham Compliance HelpLine at
>> https://www.massgeneralbrigham.org/complianceline
>> <https://www.massgeneralbrigham.org/complianceline> .
>> Please note that this e-mail is not secure (encrypted). If you do not
>> wish to continue communication over unencrypted e-mail, please notify the
>> sender of this message immediately. Continuing to send or respond to
>> e-mail after receiving this message means you understand and accept this
>> risk and wish to continue to communicate over unencrypted e-mail.
>>
>
> --
> Dipl.-Inf. Marcus Wagner
>
> IT Center
> Gruppe: Server, Storage, HPC
> Abteilung: Systeme und Betrieb
> RWTH Aachen University
> Seffenter Weg 23
> 52074 Aachen
> Tel: +49 241 80-24383
> Fax: +49 241 80-624383
> wagner at itc.rwth-aachen.de
> www.itc.rwth-aachen.de
>
> Social Media Kanäle des IT Centers:
> https://blog.rwth-aachen.de/itc/
> https://www.facebook.com/itcenterrwth
> https://www.linkedin.com/company/itcenterrwth
> https://twitter.com/ITCenterRWTH
> https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ
>
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.
More information about the slurm-users
mailing list