<div dir="ltr">Nice find. Thanks for sharing back.<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Dec 13, 2022 at 10:39 AM Paul Raines <<a href="mailto:raines@nmr.mgh.harvard.edu">raines@nmr.mgh.harvard.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
Yes, looks like SLURM is using the apicid that is in /proc/cpuinfo<br>
The first 14 cpus in /proc/cpu (procs 0-13) have apicid <br>
0,2,4,6,8,10,12,14,16,20,22,24,26,28 in /proc/cpuinfo<br>
<br>
So after setting CpuSpecList=0,2,4,6,8,10,12,14,16,18,20,22,24,26<br>
in slurm.conf it appears to be doing what I want<br>
<br>
$ echo $SLURM_JOB_ID<br>
9<br>
$ grep -i ^cpu /proc/self/status<br>
Cpus_allowed: 000f0000,000f0000<br>
Cpus_allowed_list: 16-19,48-51<br>
$ scontrol -d show job 9 | grep CPU_ID<br>
Nodes=larkin CPU_IDs=32-39 Mem=25600 GRES=<br>
<br>
apcid=32 is processor=16 and apcid=33 is processor=48 in /proc/cpuinfo<br>
<br>
Thanks<br>
<br>
-- Paul Raines (<a href="http://help.nmr.mgh.harvard.edu" rel="noreferrer" target="_blank">http://help.nmr.mgh.harvard.edu</a>)<br>
<br>
<br>
<br>
On Tue, 13 Dec 2022 9:52am, Sean Maxwell wrote:<br>
<br>
> External Email - Use Caution <br>
><br>
> In the slurm.conf manual they state the CpuSpecList ids are "abstract", and<br>
> in the CPU management docs they enforce the notion that the abstract Slurm<br>
> IDs are not related to the Linux hardware IDs, so that is probably the<br>
> source of the behavior. I unfortunately don't have more information.<br>
><br>
> On Tue, Dec 13, 2022 at 9:45 AM Paul Raines <<a href="mailto:raines@nmr.mgh.harvard.edu" target="_blank">raines@nmr.mgh.harvard.edu</a>><br>
> wrote:<br>
><br>
>><br>
>> Hmm. Actually looks like confusion between CPU IDs on the system<br>
>> and what SLURM thinks the IDs are<br>
>><br>
>> # scontrol -d show job 8<br>
>> ...<br>
>> Nodes=foobar CPU_IDs=14-21 Mem=25600 GRES=<br>
>> ...<br>
>><br>
>> # cat<br>
>> /sys/fs/cgroup/system.slice/slurmstepd.scope/job_8/cpuset.cpus.effective<br>
>> 7-10,39-42<br>
>><br>
>><br>
>> -- Paul Raines (<a href="http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu" rel="noreferrer" target="_blank">http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu</a>)<br>
>><br>
>><br>
>><br>
>> On Tue, 13 Dec 2022 9:40am, Paul Raines wrote:<br>
>><br>
>> ><br>
>> > Oh but that does explain the CfgTRES=cpu=14. With the CpuSpecList<br>
>> > below and SlurmdOffSpec I do get CfgTRES=cpu=50 so that makes sense.<br>
>> ><br>
>> > The issue remains that thought the number of cpus in CpuSpecList<br>
>> > is taken into account, the exact IDs seem to be ignored.<br>
>> ><br>
>> ><br>
>> > -- Paul Raines (<a href="http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu" rel="noreferrer" target="_blank">http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu</a>)<br>
>> ><br>
>> ><br>
>> ><br>
>> > On Tue, 13 Dec 2022 9:34am, Paul Raines wrote:<br>
>> ><br>
>> >><br>
>> >> I have tried it both ways with the same result. The assigned CPUs<br>
>> >> will be both in and out of the range given to CpuSpecList<br>
>> >><br>
>> >> I tried setting using commas instead of ranges so used<br>
>> >><br>
>> >> CpuSpecList=0,1,2,3,4,5,6,7,8,9,10,11,12,13<br>
>> >><br>
>> >> But still does not work<br>
>> >><br>
>> >> $ srun -p basic -N 1 --ntasks-per-node=1 --mem=25G \<br>
>> >> --time=10:00:00 --cpus-per-task=8 --pty /bin/bash<br>
>> >> $ grep -i ^cpu /proc/self/status<br>
>> >> Cpus_allowed: 00000780,00000780<br>
>> >> Cpus_allowed_list: 7-10,39-42<br>
>> >><br>
>> >><br>
>> >> -- Paul Raines (<a href="http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu" rel="noreferrer" target="_blank">http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu</a>)<br>
>> >><br>
>> >><br>
>> >><br>
>> >> On Mon, 12 Dec 2022 10:21am, Sean Maxwell wrote:<br>
>> >><br>
>> >>> Hi Paul,<br>
>> >>><br>
>> >>> Nodename=foobar \<br>
>> >>>> CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16<br>
>> >>>> ThreadsPerCore=2<br>
>> >>>> \<br>
>> >>>> RealMemory=256312 MemSpecLimit=32768 CpuSpecList=14-63 \<br>
>> >>>> TmpDisk=6000000 Gres=gpu:nvidia_rtx_a6000:1<br>
>> >>>><br>
>> >>>> The slurm.conf also has:<br>
>> >>>><br>
>> >>>> ProctrackType=proctrack/cgroup<br>
>> >>>> TaskPlugin=task/affinity,task/cgroup<br>
>> >>>> TaskPluginParam=Cores,*SlurmdOf**fSpec*,Verbose<br>
>> >>>><br>
>> >>><br>
>> >>> Doesn't setting SlurmdOffSpec tell Slurmd that is should NOT use the<br>
>> >>> CPUs<br>
>> >>> in the spec list? (<br>
>> >>> <a href="https://secure-web.cisco.com/1V9Fskh4lCAx_XrdlCr8o1EtnePELf-1YK4TerT47ktLxy3fO9FaIpaGXVA8ODhMAdhmXSqToQstwAilA71r7z1Q4jDqPSKEsJQNUhJYYRtxFnZIO49QxsYrVo9c3ExH89cIk_t7H5dtGEjpme2LFKm23Z52yK-xZ3fEl_LyK61uCzkas6GKykzPCPyoNXaFgs32Ct2tDIVL8vI6JW1_-1uQ9gUyWmm24xJoBxLEui7tSTVwMtiVRu8C7pU1nJ8qr6ghBlxrqx-wQiRP4XBCjhPARHa2KBqkUogjEVRAe3WdAbbYBxtXeVsWjqNGmjSVA/https%3A%2F%2Fslurm.schedmd.com%2Fslurm.conf.html%23OPT_SlurmdOffSpec" rel="noreferrer" target="_blank">https://secure-web.cisco.com/1V9Fskh4lCAx_XrdlCr8o1EtnePELf-1YK4TerT47ktLxy3fO9FaIpaGXVA8ODhMAdhmXSqToQstwAilA71r7z1Q4jDqPSKEsJQNUhJYYRtxFnZIO49QxsYrVo9c3ExH89cIk_t7H5dtGEjpme2LFKm23Z52yK-xZ3fEl_LyK61uCzkas6GKykzPCPyoNXaFgs32Ct2tDIVL8vI6JW1_-1uQ9gUyWmm24xJoBxLEui7tSTVwMtiVRu8C7pU1nJ8qr6ghBlxrqx-wQiRP4XBCjhPARHa2KBqkUogjEVRAe3WdAbbYBxtXeVsWjqNGmjSVA/https%3A%2F%2Fslurm.schedmd.com%2Fslurm.conf.html%23OPT_SlurmdOffSpec</a>)<br>
>> >>> In this case, I believe it uses what is left, which is the 0-13. We<br>
>> are<br>
>> >>> just starting to work on this ourselves, and were looking at this<br>
>> >>> setting.<br>
>> >>><br>
>> >>> Best,<br>
>> >>><br>
>> >>> -Sean<br>
>> >>><br>
>> >><br>
>> ><br>
>> The information in this e-mail is intended only for the person to whom it<br>
>> is addressed. If you believe this e-mail was sent to you in error and the<br>
>> e-mail contains patient information, please contact the Mass General<br>
>> Brigham Compliance HelpLine at<br>
>> <a href="https://secure-web.cisco.com/11OmVChs0jRoe-4AH2iRxvEdMN0dxZcFsunG07PJ0sXxdW7tj7-BUiDwEEi3gkqOms_qFRdQbCLHJQW0jD6cG8-griFmte8mXIoPZSDzIE8dHcew9yMCpQxJnYVVs8mK5aB-9o4ospPlPqxo3FA0LN8gpJSrsBKOxr5m7T3Jd7FY04zJnehrYc0FQwfWAPJx523fZTqVTTmwZgdEFZAQtURZ8hPxlohSzsh7d13L7byOVUmxAxzolzDTvRSH9l1gjMm-RjtdW95eYkgPlRoM3nJ0WCledYAp5NA3kUGNhsc5uNDp3lWIzS7gZGIMfTyg9/https%3A%2F%2Fwww.massgeneralbrigham.org%2Fcomplianceline" rel="noreferrer" target="_blank">https://secure-web.cisco.com/11OmVChs0jRoe-4AH2iRxvEdMN0dxZcFsunG07PJ0sXxdW7tj7-BUiDwEEi3gkqOms_qFRdQbCLHJQW0jD6cG8-griFmte8mXIoPZSDzIE8dHcew9yMCpQxJnYVVs8mK5aB-9o4ospPlPqxo3FA0LN8gpJSrsBKOxr5m7T3Jd7FY04zJnehrYc0FQwfWAPJx523fZTqVTTmwZgdEFZAQtURZ8hPxlohSzsh7d13L7byOVUmxAxzolzDTvRSH9l1gjMm-RjtdW95eYkgPlRoM3nJ0WCledYAp5NA3kUGNhsc5uNDp3lWIzS7gZGIMfTyg9/https%3A%2F%2Fwww.massgeneralbrigham.org%2Fcomplianceline</a> <<br>
>> <a href="https://secure-web.cisco.com/11OmVChs0jRoe-4AH2iRxvEdMN0dxZcFsunG07PJ0sXxdW7tj7-BUiDwEEi3gkqOms_qFRdQbCLHJQW0jD6cG8-griFmte8mXIoPZSDzIE8dHcew9yMCpQxJnYVVs8mK5aB-9o4ospPlPqxo3FA0LN8gpJSrsBKOxr5m7T3Jd7FY04zJnehrYc0FQwfWAPJx523fZTqVTTmwZgdEFZAQtURZ8hPxlohSzsh7d13L7byOVUmxAxzolzDTvRSH9l1gjMm-RjtdW95eYkgPlRoM3nJ0WCledYAp5NA3kUGNhsc5uNDp3lWIzS7gZGIMfTyg9/https%3A%2F%2Fwww.massgeneralbrigham.org%2Fcomplianceline" rel="noreferrer" target="_blank">https://secure-web.cisco.com/11OmVChs0jRoe-4AH2iRxvEdMN0dxZcFsunG07PJ0sXxdW7tj7-BUiDwEEi3gkqOms_qFRdQbCLHJQW0jD6cG8-griFmte8mXIoPZSDzIE8dHcew9yMCpQxJnYVVs8mK5aB-9o4ospPlPqxo3FA0LN8gpJSrsBKOxr5m7T3Jd7FY04zJnehrYc0FQwfWAPJx523fZTqVTTmwZgdEFZAQtURZ8hPxlohSzsh7d13L7byOVUmxAxzolzDTvRSH9l1gjMm-RjtdW95eYkgPlRoM3nJ0WCledYAp5NA3kUGNhsc5uNDp3lWIzS7gZGIMfTyg9/https%3A%2F%2Fwww.massgeneralbrigham.org%2Fcomplianceline</a>> .<br>
>> Please note that this e-mail is not secure (encrypted). If you do not<br>
>> wish to continue communication over unencrypted e-mail, please notify the<br>
>> sender of this message immediately. Continuing to send or respond to<br>
>> e-mail after receiving this message means you understand and accept this<br>
>> risk and wish to continue to communicate over unencrypted e-mail.<br>
>><br>
>><br>
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at <a href="https://www.massgeneralbrigham.org/complianceline" rel="noreferrer" target="_blank">https://www.massgeneralbrigham.org/complianceline</a> <<a href="https://www.massgeneralbrigham.org/complianceline" rel="noreferrer" target="_blank">https://www.massgeneralbrigham.org/complianceline</a>> .<br>
Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail. <br>
<br>
</blockquote></div>