[slurm-users] CPUSpecList confusion

Wed Dec 14 17:11:48 UTC 2022

Ugh.  Guess I cannot count.  The mapping on that last node DOES work with 
the "alternating" scheme where we have

  0  0
  1  2
  2  4
  3  6
  4  8
  5 10
  6 12
  7 14
  8 16
  9 18
10 20
11 22
12  1
13  3
14  5
15  7
16  9
17 11
18 13
19 15
20 17
21 19
22 21
23 23

so CPU_IDs=8-11,20-23 does correspond to cgroup 16-23

Using the script

cd /sys/fs/cgroup/cpuset/slurm
for d in $(find -name 'job*') ; do
   j=$(echo $d | cut -d_ -f3)
   echo === $j
   scontrol -d show job $j | grep CPU_ID | cut -d' ' -f7
   cat $d/cpuset.effective_cpus
done

=== 1967214
CPU_IDs=8-11,20-23
16-23
=== 1960208
CPU_IDs=12-19
1,3,5,7,9,11,13,15
=== 1966815
CPU_IDs=0
0
=== 1966821
CPU_IDs=6
12
=== 1966818
CPU_IDs=3
6
=== 1966816
CPU_IDs=1
2
=== 1966822
CPU_IDs=7
14
=== 1966820
CPU_IDs=5
10
=== 1966819
CPU_IDs=4
8
=== 1966817
CPU_IDs=2
4

On all my nodes I see just two schemes.  The alternating odd/even one 
above and one that is does not alternate like on this box with

CPUs=32 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=1

=== 1966495
CPU_IDs=0-2
0-2
=== 1966498
CPU_IDs=10-12
10-12
=== 1966502
CPU_IDs=26-28
26-28
=== 1960064
CPU_IDs=7-9,13-25
7-9,13-25
=== 1954480
CPU_IDs=3-6
3-6

On Wed, 14 Dec 2022 9:42am, Paul Raines wrote:

>
> Yes, I see that on some of my other machines too.  So apicid is definitely 
> not what SLURM is using but somehow just lines up that way on this one 
> machine I have.
>
> I think the issue is cgroups counts starting at 0 all the cores on the first 
> socket, then all the cores on the second socket.  But SLURM (on a two socket 
> box) counts 0 as the first core on the first socket, 1 as the first core on 
> the second socket, 2 as the second core on the first socket,
> 3 as the second core on the second socket, and so on. (Looks like I am
> wrong: see below)
>
> Why slurm does this instead of just using the assignments cgroups uses
> I have no idea.  Hopefully one of the SLURM developers reads this
> and can explain
>
> Looking at another SLURM node I have (where cgroups v1 is still in use
> and HT turned off) with definition
>
> CPUs=24 Boards=1 SocketsPerBoard=2 CoresPerSocket=12 ThreadsPerCore=1
>
> I find
>
> [root at r440-17 ~]# egrep '^(apicid|proc)' /proc/cpuinfo  | tail -4
> processor       : 22
> apicid          : 22
> processor       : 23
> apicid          : 54
>
> So apicid's are NOT going to work
>
> # scontrol -d show job 1966817 | grep CPU_ID
>     Nodes=r17 CPU_IDs=2 Mem=16384 GRES=
> # cat /sys/fs/cgroup/cpuset/slurm/uid_3776056/job_1966817/cpuset.cpus
> 4
>
> If Slurm has '2' this should be the second core on the first socket so should 
> be '1' in cgroups, but it is 4 as we see above which is the fifth core on the 
> first socket.  So I guess I was wrong above.
>
> But in /proc/cpuinfo the apicid for processor 4 is 2!!!  So is apicid
> right after all?  Nope, on the same machine I have
>
> # scontrol -d show job 1960208 | grep CPU_ID
>     Nodes=r17 CPU_IDs=12-19 Mem=51200 GRES=
> # cat /sys/fs/cgroup/cpuset/slurm/uid_5164679/job_1960208/cpuset.cpus
> 1,3,5,7,9,11,13,15
>
> and in /proc/cpuinfo the apcid for processor 12 is 16
>
> # scontrol -d show job 1967214 | grep CPU_ID
>     Nodes=r17 CPU_IDs=8-11,20-23 Mem=51200 GRES=
> # cat /sys/fs/cgroup/cpuset/slurm/uid_5164679/job_1967214/cpuset.cpus
> 16-23
>
> I am totally lost now. Seems totally random. SLURM devs?  Any insight?
>
>
> -- Paul Raines (http://help.nmr.mgh.harvard.edu)
>
>
>
> On Wed, 14 Dec 2022 1:33am, Marcus Wagner wrote:
>
>>  Hi Paul,
>>
>>  sorry to say, but that has to be some coincidence on your system. I've
>>  never seen Slurm reporting using corenumbers, which are higher than the
>>  total number of cores.
>>
>>  I have e.g. a intel Platinum 8160 here. 24 Cores per Socket, no
>>  HyperThreading activated.
>>  Yet here the last lines of /proc/cpuinfo:
>>
>>  processor       : 43
>>  apicid          : 114
>>  processor       : 44
>>  apicid          : 116
>>  processor       : 45
>>  apicid          : 118
>>  processor       : 46
>>  apicid          : 120
>>  processor       : 47
>>  apicid          : 122
>>
>>  Never seen Slurm reporting corenumbers for a job > 96
>>  Nonetheless, I agree, the cores reported by Slurm mostly have nothing to
>>  do with the cores reported e.g. by cgroups.
>>  Since Slurm creates the cgroups, I wonder, why they report some kind of
>>  abstract coreid, because they should know, which cores are used, as they
>>  create the cgroups for the jobs.
>>
>>  Best
>>  Marcus
>>
>>  Am 13.12.2022 um 16:39 schrieb Paul Raines:
>>>
>>>   Yes, looks like SLURM is using the apicid that is in /proc/cpuinfo
>>>   The first 14 cpus in /proc/cpu (procs 0-13) have apicid
>>>   0,2,4,6,8,10,12,14,16,20,22,24,26,28 in /proc/cpuinfo
>>>
>>>   So after setting CpuSpecList=0,2,4,6,8,10,12,14,16,18,20,22,24,26
>>>   in slurm.conf it appears to be doing what I want
>>>
>>>   $ echo $SLURM_JOB_ID
>>>   9
>>>   $ grep -i ^cpu /proc/self/status
>>>   Cpus_allowed:   000f0000,000f0000
>>>   Cpus_allowed_list:      16-19,48-51
>>>   $ scontrol -d show job 9 | grep CPU_ID
>>>         Nodes=larkin CPU_IDs=32-39 Mem=25600 GRES=
>>>
>>>   apcid=32 is processor=16 and apcid=33 is processor=48 in /proc/cpuinfo
>>>
>>>   Thanks
>>>
>>>   -- Paul Raines (http://help.nmr.mgh.harvard.edu)
>>> 
>>> 
>>>
>>>   On Tue, 13 Dec 2022 9:52am, Sean Maxwell wrote:
>>>
>>>>          External Email - Use Caution
>>>>   In the slurm.conf manual they state the CpuSpecList ids are "abstract",
>>>>   and
>>>>   in the CPU management docs they enforce the notion that the abstract
>>>>   Slurm
>>>>   IDs are not related to the Linux hardware IDs, so that is probably the
>>>>   source of the behavior. I unfortunately don't have more information.
>>>>
>>>>   On Tue, Dec 13, 2022 at 9:45 AM Paul Raines
>>>>   <raines at nmr.mgh.harvard.edu>
>>>>   wrote:
>>>> 
>>>>>
>>>>>   Hmm.  Actually looks like confusion between CPU IDs on the system
>>>>>   and what SLURM thinks the IDs are
>>>>>
>>>>>   # scontrol -d show job 8
>>>>>   ...
>>>>>         Nodes=foobar CPU_IDs=14-21 Mem=25600 GRES=
>>>>>   ...
>>>>>
>>>>>   # cat
>>>>>   /sys/fs/cgroup/system.slice/slurmstepd.scope/job_8/cpuset.cpus.effective
>>>>>   7-10,39-42
>>>>> 
>>>>>
>>>>>   -- Paul Raines
>>>>>   (http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu)
>>>>>
>>>>> 
>>>>>
>>>>>   On Tue, 13 Dec 2022 9:40am, Paul Raines wrote:
>>>>> 
>>>>> > 
>>>>> >   Oh but that does explain the CfgTRES=cpu=14.  With the CpuSpecList
>>>>> >   below and SlurmdOffSpec I do get CfgTRES=cpu=50 so that makes sense.
>>>>> > 
>>>>> >   The issue remains that thought the number of cpus in CpuSpecList
>>>>> >   is taken into account, the exact IDs seem to be ignored.
>>>>> > 
>>>>> > 
>>>>> >   -- Paul Raines 
>>>>> >   (http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu)
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> >   On Tue, 13 Dec 2022 9:34am, Paul Raines wrote:
>>>>> > 
>>>>> >> 
>>>>> >>    I have tried it both ways with the same result.  The assigned CPUs
>>>>> >>    will be both in and out of the range given to CpuSpecList
>>>>> >> 
>>>>> >>    I tried setting using commas instead of ranges so used
>>>>> >> 
>>>>> >>    CpuSpecList=0,1,2,3,4,5,6,7,8,9,10,11,12,13
>>>>> >> 
>>>>> >>    But still does not work
>>>>> >> 
>>>>> >>    $ srun -p basic -N 1 --ntasks-per-node=1 --mem=25G \
>>>>> >>    --time=10:00:00 --cpus-per-task=8 --pty /bin/bash
>>>>> >>    $ grep -i ^cpu /proc/self/status
>>>>> >>    Cpus_allowed:   00000780,00000780
>>>>> >>    Cpus_allowed_list:      7-10,39-42
>>>>> >> 
>>>>> >> 
>>>>> >>    -- Paul Raines 
>>>>> >>  (http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu)
>>>>> >> 
>>>>> >> 
>>>>> >> 
>>>>> >>    On Mon, 12 Dec 2022 10:21am, Sean Maxwell wrote:
>>>>> >> 
>>>>> >>>     Hi Paul,
>>>>> >>> 
>>>>> >>>     Nodename=foobar \
>>>>> >>>>        CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16
>>>>> >>>>        ThreadsPerCore=2
>>>>> >>>>        \
>>>>> >>>>        RealMemory=256312 MemSpecLimit=32768 CpuSpecList=14-63 \
>>>>> >>>>        TmpDisk=6000000 Gres=gpu:nvidia_rtx_a6000:1
>>>>> >>>> 
>>>>> >>>>     The slurm.conf also has:
>>>>> >>>> 
>>>>> >>>>     ProctrackType=proctrack/cgroup
>>>>> >>>>     TaskPlugin=task/affinity,task/cgroup
>>>>> >>>>     TaskPluginParam=Cores,*SlurmdOf**fSpec*,Verbose
>>>>> >>>> 
>>>>> >>> 
>>>>> >>>     Doesn't setting SlurmdOffSpec tell Slurmd that is should NOT use 
>>>>> >>>  the
>>>>> >>>     CPUs
>>>>> >>>     in the spec list? (
>>>>> >>>     
>>>>> >>>  https://secure-web.cisco.com/1V9Fskh4lCAx_XrdlCr8o1EtnePELf-1YK4TerT47ktLxy3fO9FaIpaGXVA8ODhMAdhmXSqToQstwAilA71r7z1Q4jDqPSKEsJQNUhJYYRtxFnZIO49QxsYrVo9c3ExH89cIk_t7H5dtGEjpme2LFKm23Z52yK-xZ3fEl_LyK61uCzkas6GKykzPCPyoNXaFgs32Ct2tDIVL8vI6JW1_-1uQ9gUyWmm24xJoBxLEui7tSTVwMtiVRu8C7pU1nJ8qr6ghBlxrqx-wQiRP4XBCjhPARHa2KBqkUogjEVRAe3WdAbbYBxtXeVsWjqNGmjSVA/https%3A%2F%2Fslurm.schedmd.com%2Fslurm.conf.html%23OPT_SlurmdOffSpec)
>>>>> >>>     In this case, I believe it uses what is left, which is the 0-13. 
>>>>> >>>  We
>>>>>   are
>>>>> >>>     just starting to work on this ourselves, and were looking at 
>>>>> >>>  this
>>>>> >>>     setting.
>>>>> >>> 
>>>>> >>>     Best,
>>>>> >>> 
>>>>> >>>     -Sean
>>>>> >>> 
>>>>> >> 
>>>>> >
>>>>>   The information in this e-mail is intended only for the person to whom
>>>>>   it
>>>>>   is addressed.  If you believe this e-mail was sent to you in error and
>>>>>   the
>>>>>   e-mail contains patient information, please contact the Mass General
>>>>>   Brigham Compliance HelpLine at
>>>>>   https://secure-web.cisco.com/11OmVChs0jRoe-4AH2iRxvEdMN0dxZcFsunG07PJ0sXxdW7tj7-BUiDwEEi3gkqOms_qFRdQbCLHJQW0jD6cG8-griFmte8mXIoPZSDzIE8dHcew9yMCpQxJnYVVs8mK5aB-9o4ospPlPqxo3FA0LN8gpJSrsBKOxr5m7T3Jd7FY04zJnehrYc0FQwfWAPJx523fZTqVTTmwZgdEFZAQtURZ8hPxlohSzsh7d13L7byOVUmxAxzolzDTvRSH9l1gjMm-RjtdW95eYkgPlRoM3nJ0WCledYAp5NA3kUGNhsc5uNDp3lWIzS7gZGIMfTyg9/https%3A%2F%2Fwww.massgeneralbrigham.org%2Fcomplianceline
>>>>>   <
>>>>>   https://secure-web.cisco.com/11OmVChs0jRoe-4AH2iRxvEdMN0dxZcFsunG07PJ0sXxdW7tj7-BUiDwEEi3gkqOms_qFRdQbCLHJQW0jD6cG8-griFmte8mXIoPZSDzIE8dHcew9yMCpQxJnYVVs8mK5aB-9o4ospPlPqxo3FA0LN8gpJSrsBKOxr5m7T3Jd7FY04zJnehrYc0FQwfWAPJx523fZTqVTTmwZgdEFZAQtURZ8hPxlohSzsh7d13L7byOVUmxAxzolzDTvRSH9l1gjMm-RjtdW95eYkgPlRoM3nJ0WCledYAp5NA3kUGNhsc5uNDp3lWIzS7gZGIMfTyg9/https%3A%2F%2Fwww.massgeneralbrigham.org%2Fcomplianceline>
>>>>>   .
>>>>>   Please note that this e-mail is not secure (encrypted).  If you do not
>>>>>   wish to continue communication over unencrypted e-mail, please notify
>>>>>   the
>>>>>   sender of this message immediately.  Continuing to send or respond to
>>>>>   e-mail after receiving this message means you understand and accept
>>>>>   this
>>>>>   risk and wish to continue to communicate over unencrypted e-mail.
>>>>> 
>>>>>
>>>   The information in this e-mail is intended only for the person to whom
>>>   it
>>>   is addressed.  If you believe this e-mail was sent to you in error and
>>>   the
>>>   e-mail contains patient information, please contact the Mass General
>>>   Brigham Compliance HelpLine at
>>>   https://www.massgeneralbrigham.org/complianceline
>>>   <https://www.massgeneralbrigham.org/complianceline> .
>>>   Please note that this e-mail is not secure (encrypted).  If you do not
>>>   wish to continue communication over unencrypted e-mail, please notify
>>>   the
>>>   sender of this message immediately.  Continuing to send or respond to
>>>   e-mail after receiving this message means you understand and accept this
>>>   risk and wish to continue to communicate over unencrypted e-mail.
>>> 
>>
>>  --
>>  Dipl.-Inf. Marcus Wagner
>>
>>  IT Center
>>  Gruppe: Server, Storage, HPC
>>  Abteilung: Systeme und Betrieb
>>  RWTH Aachen University
>>  Seffenter Weg 23
>>  52074 Aachen
>>  Tel: +49 241 80-24383
>>  Fax: +49 241 80-624383
>>  wagner at itc.rwth-aachen.de
>>  www.itc.rwth-aachen.de
>>
>>  Social Media Kanäle des IT Centers:
>>  https://blog.rwth-aachen.de/itc/
>>  https://www.facebook.com/itcenterrwth
>>  https://www.linkedin.com/company/itcenterrwth
>>  https://twitter.com/ITCenterRWTH
>>  https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ
>> 
>
The information in this e-mail is intended only for the person to whom it is addressed.  If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted).  If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately.  Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.