Hello,

Background :

I am working on a small cluster that is managed by Base Command Manager v10.0 using Slurm 23.02.7 with Ubuntu 22.04.2. I have a small testing script that simply consumes memory and processors.

I run my test script, it consumes more memory than allocated by Slurm and as expected it gets killed by OOM killer. In /var/log/slurmd, I see entries like :

[2024-05-29T08:53:04.975] Launching batch job 65 for UID 1001
[2024-05-29T08:53:05.016] [65.batch] task/cgroup: _memcg_initialize: job: alloc=10868MB mem.limit=10868MB memsw.limit=10868MB job_swappiness=1
[2024-05-29T08:53:05.016] [65.batch] task/cgroup: _memcg_initialize: step: alloc=10868MB mem.limit=10868MB memsw.limit=10868MB job_swappiness=1
[2024-05-29T08:53:19.530] [65.batch] task/cgroup: task_cgroup_memory_check_oom: StepId=65.batch hit memory+swap limit at least once during execution. This may or may not result in some failure.
[2024-05-29T08:53:19.563] [65.batch] done with job

Inspecting with sacct, I see :

$ sacct -j 65 --format="jobid,jobname,state,exitcode"
JobID JobName State ExitCode
------------ ---------- ---------- --------
65 wrap FAILED 9:0
65.batch batch FAILED 9:0

Based on my previous experience with a RHEL based Slurm cluster, I would expect the state to be listed as OUT_OF_MEMORY and the exitcode to be 0:125.

Question :

1. How do I configure [slurm,cgroup].conf such that when Slurm kills a job due to exceeding the allocated memory, sacct reports the state as OUT_OF_MEMORY?

See below for my configuration :

Configuration:

/etc/default/grub :

~# grep -v "^#" /etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX="biosdevname=0 cgroup_enable=memory swapaccount=1"
GRUB_GFXMODE="1024x768,800x600,auto"
GRUB_BACKGROUND="/boot/grub/bcm.png"

slurm.conf :

~# grep -v "^#" /cm/shared/apps/slurm/var/etc/bcm10-slurm/slurm.conf
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
SlurmdSpoolDir=/cm/local/apps/slurm/var/spool
SwitchType=switch/none
MpiDefault=pmix
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/cgroup
ReturnToService=2
TaskPlugin=task/cgroup
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=300
SlurmdTimeout=300
Waittime=0
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld
SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd
JobAcctGatherType=jobacct_gather/cgroup
JobAcctGatherFrequency=30
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageUser=slurm
SlurmctldHost=bcm10-h01
AccountingStorageHost=master
NodeName=bcm10-n[01,02] Procs=4 CoresPerSocket=4 RealMemory=15988 SocketsPerBoard=1 ThreadsPerCore=1 Boards=1 MemSpecLimit=5120 Feature=location=local
PartitionName="defq" Default=YES MinNodes=1 DefaultTime=UNLIMITED MaxTime=UNLIMITED AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 OverSubscribe=NO PreemptMode=OFF AllowAccounts=ALL AllowQos=ALL Nodes=bcm10-n[01,02]
ClusterName=bcm10-slurm
SchedulerType=sched/backfill
StateSaveLocation=/cm/shared/apps/slurm/var/cm/statesave/bcm10-slurm
PrologFlags=Alloc
GresTypes=gpu
Prolog=/cm/local/apps/cmd/scripts/prolog
Epilog=/cm/local/apps/cmd/scripts/epilog
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory

cgroup.conf

~# grep -v "^#" /cm/shared/apps/slurm/var/etc/bcm10-slurm/cgroup.conf
CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=no
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
ConstrainDevices=yes
AllowedRamSpace=100.00
AllowedSwapSpace=0.00
MemorySwappiness=1
MaxRAMPercent=100.00
MaxSwapPercent=100.00
MinRAMSpace=30

Best regards,

Lee