[slurm-users] slurm, memory accounting and memory mapping

Sergey Koposov skoposov at cmu.edu
Fri Jan 11 16:49:50 UTC 2019


Hi Janne, 
On Fri, 2019-01-11 at 10:37 +0200, Janne Blomqvist wrote:
> On 11/01/2019 08.29, Sergey Koposov wrote:
> > What is your memory limit configuration in slurm? Anyway, a few things to check:
I guess these are the most relevant (uncommented) params I could see in the slurm.conf are

SelectTypeParameters=CR_Core_Memory
JobAcctGatherType=jobacct_gather/linux
TaskPlugin=task/affinity
> - Make sure you're not limiting RLIMIT_AS in any way (e.g. run "ulimit -v" in your batch script, ensure it's unlimited. In the slurm config, ensure
> VSizeFactor=0).
No, it is clearly not ulimit's issue 
as I'm using essentially my pbs script that worked fine before, plus I'm seeing these kind of errors
slurmstepd: error: Job 134 exceeded memory limit (146371328 > 131072000), being killed
slurmstepd: error: *** JOB 134 ON compute-1-26 CANCELLED AT 2019-01-11T03:22:03 **
The VsizeFactor option is commented out.

> - Are you using task/cgroup for limiting memory? In that case the problem might be that cgroup memory limits work with RSS, and as you're running multiple
> processes the shared mmap'ed file will be counted multiple times. There's no really good way around this, but with, say, something like
> 
> ConstrainRAMSpace=no
> ConstrainSwapSpace=yes
> AllowedRAMSpace=100
> AllowedSwapSpace=1600
> you'll get a setup where the cgroup soft limit will be set to the amount your job allocates, but the hard limit (where the job will be killed) will be set to
> 1600% of that.
> - If you're using cgroups for memory limits, you should also set JobAcctGatherParams=NoOverMemoryKill
> - If you're NOT using cgroups for memory limits, try setting JobAcctGatherParams=UsePSS which should avoiding counting the shared mappings multiple times.
(not sure if cgroup is used currently..) But thanks for the suggestions. We'll try those and report back.

Regards, 
         Sergey



More information about the slurm-users mailing list