[slurm-users] CPUSpecList confusion

Paul Raines raines at nmr.mgh.harvard.edu
Fri Dec 9 23:09:57 UTC 2022


I have a Rocky 8 system that with hyperthreading has 64 cores

I want the first 14 cores reservered for logged in users and non-SLURM 
work.  I want SLURM to use the rest.

I configured the box to boot with systemd.unified_cgroup_hierarchy=1
to use cgroup v2

I ran

systemctl set-property user.slice AllowedCPUs=0-13
systemctl set-property user.slice MemoryHigh=32768K

and this does work to make all system processes and the user logins
stick to cores 0 - 13

# cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective
0-63
# cat /sys/fs/cgroup/user.slice/cpuset.cpus.effective
0-13


$ ssh raines at foobar
$ grep -i ^cpu /proc/self/status
Cpus_allowed:   00000000,00003fff
Cpus_allowed_list:      0-13


I then setup Slurm with my node defined as

Nodename=foobar \
   CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=2 \
   RealMemory=256312 MemSpecLimit=32768 CpuSpecList=14-63 \
   TmpDisk=6000000 Gres=gpu:nvidia_rtx_a6000:1

The slurm.conf also has:

ProctrackType=proctrack/cgroup
TaskPlugin=task/affinity,task/cgroup
TaskPluginParam=Cores,SlurmdOffSpec,Verbose

But when I submitted jobs, they would still be assigned to
cores in the 0-13 range

$ srun -p basic -N 1 --ntasks-per-node=1 --mem=25G \
--time=10:00:00 --cpus-per-task=8 --pty /bin/bash
$ grep -i ^cpu /proc/self/status 
Cpus_allowed:   0000000f,0000000f
Cpus_allowed_list:      0-3,32-35


and I noticed if I try to submit another --cpus-per-task=8 job
while the above one is running it gets blocked with REASON (Resources)


# scontrol show node foobar
NodeName=larkin Arch=x86_64 CoresPerSocket=16
    CPUAlloc=8 CPUEfctv=14 CPUTot=64 CPULoad=0.26
    AvailableFeatures=(null)
    ActiveFeatures=(null)
    Gres=gpu:nvidia_rtx_a6000:1(S:0)
    NodeAddr=foobar NodeHostName=foobar Version=22.05.6
    OS=Linux 4.18.0-425.3.1.el8.x86_64 #1 SMP Wed Nov 9 20:13:27 UTC 2022
    RealMemory=256312 AllocMem=25600 FreeMem=252968 Sockets=2 Boards=1
    CoreSpecCount=25 CPUSpecList=14-63 MemSpecLimit=32768
    State=MIXED ThreadsPerCore=2 TmpDisk=6000000 Weight=1 Owner=N/A MCS_label=N/A
    Partitions=basic,GPU
    BootTime=2022-12-09T17:40:02 SlurmdStartTime=2022-12-09T17:43:01
    LastBusyTime=2022-12-09T17:43:01
    CfgTRES=cpu=14,mem=256312M,billing=23,gres/gpu=1
    AllocTRES=cpu=8,mem=25G
    CapWatts=n/a
    CurrentWatts=0 AveWatts=0
    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

I see here that CfgTRES=cpu=14  so it appears that SLURM is
processing CPUSpecList=14-63 as just me telling it that it can
only use 14 CPUs

What is the syntax for CPUSpecList to do what I want and limit
SLURM to using CPUs 14-63 and seeing it has 50 CPUs it can use.


---------------------------------------------------------------
Paul Raines                     http://help.nmr.mgh.harvard.edu
MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging
149 (2301) 13th Street     Charlestown, MA 02129	    USA



The information in this e-mail is intended only for the person to whom it is addressed.  If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted).  If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately.  Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail. 




More information about the slurm-users mailing list