[slurm-users] memory limit questions

Fri Jul 13 10:12:39 MDT 2018

I am trying slurm for the first time on test machines, version 17.02.7
on CentOS7.5 boxes.

Relevant lines from my slurm.conf

ProctrackType=proctrack/cgroup
SwitchType=switch/none
PropagateResourceLimitsExcept=MEMLOCK
TaskPlugin=task/cgroup
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
FastSchedule=1
PriorityType=priority/multifactor
PriorityWeightAge=1000
PriorityWeightPartition=10000
JobAcctGatherType=jobacct_gather/cgroup
JobAcctGatherFrequency=30
NodeName=icenode00 Procs=6 State=UNKNOWN RealMemory=62000
...
NodeName=icenode05 Procs=6 State=UNKNOWN RealMemory=62000
PartitionName=nmrdef Nodes=icenode[00,02,03,04,05] Default=YES 
MaxTime=7-00:00:00 DefaultTime=3-00:00:00 State=UP PriorityJobFactor=1000 
LLN=Yes
PartitionName=p6 Nodes=icenode[00,02,03,04,05] MaxTime=7-00:00:00 
DefaultTime=3-00:00:00 State=UP PriorityJobFactor=6000 LLN=Yes

And cgroup.conf has

CgroupAutomount=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
AllowedRAMSpace=95.0

I submit my first "real" job like this and it runs fine

$ sbatch -N 1 --ntasks-per-node=1 -c 4 --mem=20G -p p6 -o test1.out 
mriqctest.sh
Submitted batch job 23
$ sacct -j 23 -o reqmem,maxvmsize,maxrss,exitcode,totalCPU,elapsed
     ReqMem  MaxVMSize     MaxRSS ExitCode   TotalCPU    Elapsed
---------- ---------- ---------- -------- ---------- ----------
       20Gn                            0:0  00:15.254   00:01:18
       20Gn    140504K    357712K      0:0  00:15.254   00:01:18

I don't understand why I get these two different lines from sacct, but
whatever.  What confuses me is how can MaxRSS be greater than MaxVMSize?
I am guessing just a sampling timing issue since the job was < 2 minutes.

Anyway, I want to test the memory limit contraint so I next submit with

$ sbatch -N 1 --ntasks-per-node=1 -c 1 --mem=100M -p p6 -o test2.out 
mriqctest.sh
Submitted batch job 24

Running top shows mriqc processing using a ton of CPU for over
15 minutes and zero output to test2.out.  Finally I kill it

[root at icestorm ~]# sacct -j 24 -o 
reqmem,maxvmsize,maxrss,exitcode,totalCPU,elapsed
     ReqMem  MaxVMSize     MaxRSS ExitCode   TotalCPU    Elapsed
---------- ---------- ---------- -------- ---------- ----------
      100Mn                           15:0  19:41.284   00:19:43
      100Mn    140504K     45656K     15:0  19:41.284   00:19:43

At that point I see in test2.out

/var/spool/slurm/d/job00024/slurm_script: line 15:  8854 Terminated 
singularity exec -B $PWD/ds008_R2.0.0:/data:ro -B $PWD/out$1:/out 
/usr/pubsw/packages/mriqc/current/mriqc.simg mriqc --no-sub /data /out 
participant --participant_label sub-15
slurmstepd: error: Exceeded step memory limit at some point.

So my question is why did the job not kill itself for exceeding memory
limit and why did it burn through CPU seemily forever like it did?

I don't have code for the process I am running but would it be
because some loop with a malloc() in it that fails just keeps
looping and not cleanly failing?  But if that was the case why
would slurmstepd see the memory step exceeded which if it did
why did it not kill the process?

---------------------------------------------------------------
Paul Raines                     http://help.nmr.mgh.harvard.edu
MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging
149 (2301) 13th Street     Charlestown, MA 02129	    USA

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.