[slurm-users] CPUSpecList confusion
Paul Raines
raines at nmr.mgh.harvard.edu
Tue Dec 13 14:45:23 UTC 2022
Hmm. Actually looks like confusion between CPU IDs on the system
and what SLURM thinks the IDs are
# scontrol -d show job 8
...
Nodes=foobar CPU_IDs=14-21 Mem=25600 GRES=
...
# cat
/sys/fs/cgroup/system.slice/slurmstepd.scope/job_8/cpuset.cpus.effective
7-10,39-42
-- Paul Raines (http://help.nmr.mgh.harvard.edu)
On Tue, 13 Dec 2022 9:40am, Paul Raines wrote:
>
> Oh but that does explain the CfgTRES=cpu=14. With the CpuSpecList
> below and SlurmdOffSpec I do get CfgTRES=cpu=50 so that makes sense.
>
> The issue remains that thought the number of cpus in CpuSpecList
> is taken into account, the exact IDs seem to be ignored.
>
>
> -- Paul Raines (http://help.nmr.mgh.harvard.edu)
>
>
>
> On Tue, 13 Dec 2022 9:34am, Paul Raines wrote:
>
>>
>> I have tried it both ways with the same result. The assigned CPUs
>> will be both in and out of the range given to CpuSpecList
>>
>> I tried setting using commas instead of ranges so used
>>
>> CpuSpecList=0,1,2,3,4,5,6,7,8,9,10,11,12,13
>>
>> But still does not work
>>
>> $ srun -p basic -N 1 --ntasks-per-node=1 --mem=25G \
>> --time=10:00:00 --cpus-per-task=8 --pty /bin/bash
>> $ grep -i ^cpu /proc/self/status
>> Cpus_allowed: 00000780,00000780
>> Cpus_allowed_list: 7-10,39-42
>>
>>
>> -- Paul Raines (http://help.nmr.mgh.harvard.edu)
>>
>>
>>
>> On Mon, 12 Dec 2022 10:21am, Sean Maxwell wrote:
>>
>>> Hi Paul,
>>>
>>> Nodename=foobar \
>>>> CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16
>>>> ThreadsPerCore=2
>>>> \
>>>> RealMemory=256312 MemSpecLimit=32768 CpuSpecList=14-63 \
>>>> TmpDisk=6000000 Gres=gpu:nvidia_rtx_a6000:1
>>>>
>>>> The slurm.conf also has:
>>>>
>>>> ProctrackType=proctrack/cgroup
>>>> TaskPlugin=task/affinity,task/cgroup
>>>> TaskPluginParam=Cores,*SlurmdOf**fSpec*,Verbose
>>>>
>>>
>>> Doesn't setting SlurmdOffSpec tell Slurmd that is should NOT use the
>>> CPUs
>>> in the spec list? (
>>> https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdOffSpec)
>>> In this case, I believe it uses what is left, which is the 0-13. We are
>>> just starting to work on this ourselves, and were looking at this
>>> setting.
>>>
>>> Best,
>>>
>>> -Sean
>>>
>>
>
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.
More information about the slurm-users
mailing list