[slurm-users] CPUSpecList confusion
Paul Raines
raines at nmr.mgh.harvard.edu
Fri Dec 9 23:09:57 UTC 2022
I have a Rocky 8 system that with hyperthreading has 64 cores
I want the first 14 cores reservered for logged in users and non-SLURM
work. I want SLURM to use the rest.
I configured the box to boot with systemd.unified_cgroup_hierarchy=1
to use cgroup v2
I ran
systemctl set-property user.slice AllowedCPUs=0-13
systemctl set-property user.slice MemoryHigh=32768K
and this does work to make all system processes and the user logins
stick to cores 0 - 13
# cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective
0-63
# cat /sys/fs/cgroup/user.slice/cpuset.cpus.effective
0-13
$ ssh raines at foobar
$ grep -i ^cpu /proc/self/status
Cpus_allowed: 00000000,00003fff
Cpus_allowed_list: 0-13
I then setup Slurm with my node defined as
Nodename=foobar \
CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=2 \
RealMemory=256312 MemSpecLimit=32768 CpuSpecList=14-63 \
TmpDisk=6000000 Gres=gpu:nvidia_rtx_a6000:1
The slurm.conf also has:
ProctrackType=proctrack/cgroup
TaskPlugin=task/affinity,task/cgroup
TaskPluginParam=Cores,SlurmdOffSpec,Verbose
But when I submitted jobs, they would still be assigned to
cores in the 0-13 range
$ srun -p basic -N 1 --ntasks-per-node=1 --mem=25G \
--time=10:00:00 --cpus-per-task=8 --pty /bin/bash
$ grep -i ^cpu /proc/self/status
Cpus_allowed: 0000000f,0000000f
Cpus_allowed_list: 0-3,32-35
and I noticed if I try to submit another --cpus-per-task=8 job
while the above one is running it gets blocked with REASON (Resources)
# scontrol show node foobar
NodeName=larkin Arch=x86_64 CoresPerSocket=16
CPUAlloc=8 CPUEfctv=14 CPUTot=64 CPULoad=0.26
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=gpu:nvidia_rtx_a6000:1(S:0)
NodeAddr=foobar NodeHostName=foobar Version=22.05.6
OS=Linux 4.18.0-425.3.1.el8.x86_64 #1 SMP Wed Nov 9 20:13:27 UTC 2022
RealMemory=256312 AllocMem=25600 FreeMem=252968 Sockets=2 Boards=1
CoreSpecCount=25 CPUSpecList=14-63 MemSpecLimit=32768
State=MIXED ThreadsPerCore=2 TmpDisk=6000000 Weight=1 Owner=N/A MCS_label=N/A
Partitions=basic,GPU
BootTime=2022-12-09T17:40:02 SlurmdStartTime=2022-12-09T17:43:01
LastBusyTime=2022-12-09T17:43:01
CfgTRES=cpu=14,mem=256312M,billing=23,gres/gpu=1
AllocTRES=cpu=8,mem=25G
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
I see here that CfgTRES=cpu=14 so it appears that SLURM is
processing CPUSpecList=14-63 as just me telling it that it can
only use 14 CPUs
What is the syntax for CPUSpecList to do what I want and limit
SLURM to using CPUs 14-63 and seeing it has 50 CPUs it can use.
---------------------------------------------------------------
Paul Raines http://help.nmr.mgh.harvard.edu
MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging
149 (2301) 13th Street Charlestown, MA 02129 USA
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.
More information about the slurm-users
mailing list