[slurm-users] CPUSpecList confusion

Wed Dec 14 14:42:23 UTC 2022

Yes, I see that on some of my other machines too.  So apicid is definitely 
not what SLURM is using but somehow just lines up that way on this one 
machine I have.

I think the issue is cgroups counts starting at 0 all the cores on the 
first socket, then all the cores on the second socket.  But SLURM (on a 
two socket box) counts 0 as the first core on the first socket, 1 as the 
first core on the second socket, 2 as the second core on the first socket,
3 as the second core on the second socket, and so on. (Looks like I am
wrong: see below)

Why slurm does this instead of just using the assignments cgroups uses
I have no idea.  Hopefully one of the SLURM developers reads this
and can explain

Looking at another SLURM node I have (where cgroups v1 is still in use
and HT turned off) with definition

CPUs=24 Boards=1 SocketsPerBoard=2 CoresPerSocket=12 ThreadsPerCore=1

I find

[root at r440-17 ~]# egrep '^(apicid|proc)' /proc/cpuinfo  | tail -4
processor       : 22
apicid          : 22
processor       : 23
apicid          : 54

So apicid's are NOT going to work

# scontrol -d show job 1966817 | grep CPU_ID
      Nodes=r17 CPU_IDs=2 Mem=16384 GRES=
# cat /sys/fs/cgroup/cpuset/slurm/uid_3776056/job_1966817/cpuset.cpus
4

If Slurm has '2' this should be the second core on the first socket so 
should be '1' in cgroups, but it is 4 as we see above which is the fifth 
core on the first socket.  So I guess I was wrong above.

But in /proc/cpuinfo the apicid for processor 4 is 2!!!  So is apicid
right after all?  Nope, on the same machine I have

# scontrol -d show job 1960208 | grep CPU_ID
      Nodes=r17 CPU_IDs=12-19 Mem=51200 GRES=
# cat /sys/fs/cgroup/cpuset/slurm/uid_5164679/job_1960208/cpuset.cpus
1,3,5,7,9,11,13,15

and in /proc/cpuinfo the apcid for processor 12 is 16

# scontrol -d show job 1967214 | grep CPU_ID
      Nodes=r17 CPU_IDs=8-11,20-23 Mem=51200 GRES=
# cat /sys/fs/cgroup/cpuset/slurm/uid_5164679/job_1967214/cpuset.cpus
16-23

I am totally lost now. Seems totally random. SLURM devs?  Any insight?

-- Paul Raines (http://help.nmr.mgh.harvard.edu)

On Wed, 14 Dec 2022 1:33am, Marcus Wagner wrote:

> Hi Paul,
>
> sorry to say, but that has to be some coincidence on your system. I've never 
> seen Slurm reporting using corenumbers, which are higher than the total 
> number of cores.
>
> I have e.g. a intel Platinum 8160 here. 24 Cores per Socket, no 
> HyperThreading activated.
> Yet here the last lines of /proc/cpuinfo:
>
> processor       : 43
> apicid          : 114
> processor       : 44
> apicid          : 116
> processor       : 45
> apicid          : 118
> processor       : 46
> apicid          : 120
> processor       : 47
> apicid          : 122
>
> Never seen Slurm reporting corenumbers for a job > 96
> Nonetheless, I agree, the cores reported by Slurm mostly have nothing to do 
> with the cores reported e.g. by cgroups.
> Since Slurm creates the cgroups, I wonder, why they report some kind of 
> abstract coreid, because they should know, which cores are used, as they 
> create the cgroups for the jobs.
>
> Best
> Marcus
>
> Am 13.12.2022 um 16:39 schrieb Paul Raines:
>>
>>  Yes, looks like SLURM is using the apicid that is in /proc/cpuinfo
>>  The first 14 cpus in /proc/cpu (procs 0-13) have apicid
>>  0,2,4,6,8,10,12,14,16,20,22,24,26,28 in /proc/cpuinfo
>>
>>  So after setting CpuSpecList=0,2,4,6,8,10,12,14,16,18,20,22,24,26
>>  in slurm.conf it appears to be doing what I want
>>
>>  $ echo $SLURM_JOB_ID
>>  9
>>  $ grep -i ^cpu /proc/self/status
>>  Cpus_allowed:   000f0000,000f0000
>>  Cpus_allowed_list:      16-19,48-51
>>  $ scontrol -d show job 9 | grep CPU_ID
>>        Nodes=larkin CPU_IDs=32-39 Mem=25600 GRES=
>>
>>  apcid=32 is processor=16 and apcid=33 is processor=48 in /proc/cpuinfo
>>
>>  Thanks
>>
>>  -- Paul Raines (http://help.nmr.mgh.harvard.edu)
>>
>>
>>
>>  On Tue, 13 Dec 2022 9:52am, Sean Maxwell wrote:
>>
>>>         External Email - Use Caution
>>>  In the slurm.conf manual they state the CpuSpecList ids are "abstract",
>>>  and
>>>  in the CPU management docs they enforce the notion that the abstract
>>>  Slurm
>>>  IDs are not related to the Linux hardware IDs, so that is probably the
>>>  source of the behavior. I unfortunately don't have more information.
>>>
>>>  On Tue, Dec 13, 2022 at 9:45 AM Paul Raines <raines at nmr.mgh.harvard.edu>
>>>  wrote:
>>> 
>>>>
>>>>  Hmm.  Actually looks like confusion between CPU IDs on the system
>>>>  and what SLURM thinks the IDs are
>>>>
>>>>  # scontrol -d show job 8
>>>>  ...
>>>>        Nodes=foobar CPU_IDs=14-21 Mem=25600 GRES=
>>>>  ...
>>>>
>>>>  # cat
>>>>  /sys/fs/cgroup/system.slice/slurmstepd.scope/job_8/cpuset.cpus.effective
>>>>  7-10,39-42
>>>> 
>>>>
>>>>  -- Paul Raines
>>>>  (http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu)
>>>> 
>>>> 
>>>>
>>>>  On Tue, 13 Dec 2022 9:40am, Paul Raines wrote:
>>>> 
>>>> > 
>>>> >  Oh but that does explain the CfgTRES=cpu=14.  With the CpuSpecList
>>>> >  below and SlurmdOffSpec I do get CfgTRES=cpu=50 so that makes sense.
>>>> > 
>>>> >  The issue remains that thought the number of cpus in CpuSpecList
>>>> >  is taken into account, the exact IDs seem to be ignored.
>>>> > 
>>>> > 
>>>> >  -- Paul Raines 
>>>> >  (http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu)
>>>> > 
>>>> > 
>>>> > 
>>>> >  On Tue, 13 Dec 2022 9:34am, Paul Raines wrote:
>>>> > 
>>>> >> 
>>>> >>   I have tried it both ways with the same result.  The assigned CPUs
>>>> >>   will be both in and out of the range given to CpuSpecList
>>>> >> 
>>>> >>   I tried setting using commas instead of ranges so used
>>>> >> 
>>>> >>   CpuSpecList=0,1,2,3,4,5,6,7,8,9,10,11,12,13
>>>> >> 
>>>> >>   But still does not work
>>>> >> 
>>>> >>   $ srun -p basic -N 1 --ntasks-per-node=1 --mem=25G \
>>>> >>   --time=10:00:00 --cpus-per-task=8 --pty /bin/bash
>>>> >>   $ grep -i ^cpu /proc/self/status
>>>> >>   Cpus_allowed:   00000780,00000780
>>>> >>   Cpus_allowed_list:      7-10,39-42
>>>> >> 
>>>> >> 
>>>> >>   -- Paul Raines 
>>>> >> (http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu)
>>>> >> 
>>>> >> 
>>>> >> 
>>>> >>   On Mon, 12 Dec 2022 10:21am, Sean Maxwell wrote:
>>>> >> 
>>>> >>>    Hi Paul,
>>>> >>> 
>>>> >>>    Nodename=foobar \
>>>> >>>>       CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16
>>>> >>>>       ThreadsPerCore=2
>>>> >>>>       \
>>>> >>>>       RealMemory=256312 MemSpecLimit=32768 CpuSpecList=14-63 \
>>>> >>>>       TmpDisk=6000000 Gres=gpu:nvidia_rtx_a6000:1
>>>> >>>> 
>>>> >>>>    The slurm.conf also has:
>>>> >>>> 
>>>> >>>>    ProctrackType=proctrack/cgroup
>>>> >>>>    TaskPlugin=task/affinity,task/cgroup
>>>> >>>>    TaskPluginParam=Cores,*SlurmdOf**fSpec*,Verbose
>>>> >>>> 
>>>> >>> 
>>>> >>>    Doesn't setting SlurmdOffSpec tell Slurmd that is should NOT use 
>>>> >>> the
>>>> >>>    CPUs
>>>> >>>    in the spec list? (
>>>> >>>    
>>>> >>> https://secure-web.cisco.com/1V9Fskh4lCAx_XrdlCr8o1EtnePELf-1YK4TerT47ktLxy3fO9FaIpaGXVA8ODhMAdhmXSqToQstwAilA71r7z1Q4jDqPSKEsJQNUhJYYRtxFnZIO49QxsYrVo9c3ExH89cIk_t7H5dtGEjpme2LFKm23Z52yK-xZ3fEl_LyK61uCzkas6GKykzPCPyoNXaFgs32Ct2tDIVL8vI6JW1_-1uQ9gUyWmm24xJoBxLEui7tSTVwMtiVRu8C7pU1nJ8qr6ghBlxrqx-wQiRP4XBCjhPARHa2KBqkUogjEVRAe3WdAbbYBxtXeVsWjqNGmjSVA/https%3A%2F%2Fslurm.schedmd.com%2Fslurm.conf.html%23OPT_SlurmdOffSpec)
>>>> >>>    In this case, I believe it uses what is left, which is the 0-13. 
>>>> >>> We
>>>>  are
>>>> >>>    just starting to work on this ourselves, and were looking at this
>>>> >>>    setting.
>>>> >>> 
>>>> >>>    Best,
>>>> >>> 
>>>> >>>    -Sean
>>>> >>> 
>>>> >> 
>>>> >
>>>>  The information in this e-mail is intended only for the person to whom
>>>>  it
>>>>  is addressed.  If you believe this e-mail was sent to you in error and
>>>>  the
>>>>  e-mail contains patient information, please contact the Mass General
>>>>  Brigham Compliance HelpLine at
>>>>  https://secure-web.cisco.com/11OmVChs0jRoe-4AH2iRxvEdMN0dxZcFsunG07PJ0sXxdW7tj7-BUiDwEEi3gkqOms_qFRdQbCLHJQW0jD6cG8-griFmte8mXIoPZSDzIE8dHcew9yMCpQxJnYVVs8mK5aB-9o4ospPlPqxo3FA0LN8gpJSrsBKOxr5m7T3Jd7FY04zJnehrYc0FQwfWAPJx523fZTqVTTmwZgdEFZAQtURZ8hPxlohSzsh7d13L7byOVUmxAxzolzDTvRSH9l1gjMm-RjtdW95eYkgPlRoM3nJ0WCledYAp5NA3kUGNhsc5uNDp3lWIzS7gZGIMfTyg9/https%3A%2F%2Fwww.massgeneralbrigham.org%2Fcomplianceline
>>>>  <
>>>>  https://secure-web.cisco.com/11OmVChs0jRoe-4AH2iRxvEdMN0dxZcFsunG07PJ0sXxdW7tj7-BUiDwEEi3gkqOms_qFRdQbCLHJQW0jD6cG8-griFmte8mXIoPZSDzIE8dHcew9yMCpQxJnYVVs8mK5aB-9o4ospPlPqxo3FA0LN8gpJSrsBKOxr5m7T3Jd7FY04zJnehrYc0FQwfWAPJx523fZTqVTTmwZgdEFZAQtURZ8hPxlohSzsh7d13L7byOVUmxAxzolzDTvRSH9l1gjMm-RjtdW95eYkgPlRoM3nJ0WCledYAp5NA3kUGNhsc5uNDp3lWIzS7gZGIMfTyg9/https%3A%2F%2Fwww.massgeneralbrigham.org%2Fcomplianceline>
>>>>  .
>>>>  Please note that this e-mail is not secure (encrypted).  If you do not
>>>>  wish to continue communication over unencrypted e-mail, please notify
>>>>  the
>>>>  sender of this message immediately.  Continuing to send or respond to
>>>>  e-mail after receiving this message means you understand and accept this
>>>>  risk and wish to continue to communicate over unencrypted e-mail.
>>>> 
>>>>
>>  The information in this e-mail is intended only for the person to whom it
>>  is addressed.  If you believe this e-mail was sent to you in error and the
>>  e-mail contains patient information, please contact the Mass General
>>  Brigham Compliance HelpLine at
>>  https://www.massgeneralbrigham.org/complianceline
>>  <https://www.massgeneralbrigham.org/complianceline> .
>>  Please note that this e-mail is not secure (encrypted).  If you do not
>>  wish to continue communication over unencrypted e-mail, please notify the
>>  sender of this message immediately.  Continuing to send or respond to
>>  e-mail after receiving this message means you understand and accept this
>>  risk and wish to continue to communicate over unencrypted e-mail.
>> 
>
> -- 
> Dipl.-Inf. Marcus Wagner
>
> IT Center
> Gruppe: Server, Storage, HPC
> Abteilung: Systeme und Betrieb
> RWTH Aachen University
> Seffenter Weg 23
> 52074 Aachen
> Tel: +49 241 80-24383
> Fax: +49 241 80-624383
> wagner at itc.rwth-aachen.de
> www.itc.rwth-aachen.de
>
> Social Media Kanäle des IT Centers:
> https://blog.rwth-aachen.de/itc/
> https://www.facebook.com/itcenterrwth
> https://www.linkedin.com/company/itcenterrwth
> https://twitter.com/ITCenterRWTH
> https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ
>
The information in this e-mail is intended only for the person to whom it is addressed.  If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted).  If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately.  Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.