[slurm-users] memory limits:: why job is not killed but oom-killer steps up?

Thu Jan 13 08:59:19 UTC 2022

Hi Adrian,

ConstrainRAMSpace=yes

has the effect that when the memory the job requested is exhausted the 
processes of the job will start paging/swapping.

If you want to stop jobs that use more memory (RSS to be precise) than 
they reqeusted, you have to add this to your cgroup.conf:

ConstrainSwapSpace=yes
AllowedSwapSpace=0

Regards,
Hermann

On 1/12/22 11:04 PM, Adrian Sevcenco wrote:
> 
> Hi! I have a problem with the enforcing the memory limits...
> I'm using the cgroup to enforce the limits and i had expected that when
> cgroup memory limits are reach the job is killed ..
> instead i see in log a lot of oom-killer reports that act only a certain 
> process
> from cgroup ...
> 
> Did i missed anything in my configuration? I have the following:
> 
> SelectType=select/cons_res
> SelectTypeParameters=CR_CPU_MEMORY,CR_LLN
> 
> the partition have:
> DefMemPerCPU=3950 MaxMemPerCPU=4010  (i understood that these are MiB, 
> and physically i have 4GiB/thread)
> 
> cat cgroup.conf
> CgroupAutomount=yes
> TaskAffinity=no
> ConstrainCores=yes
> ConstrainRAMSpace=yes
> 
> ProctrackType=proctrack/cgroup
> 
> JobAcctGatherType=jobacct_gather/linux
> JobAcctGatherFrequency=task=15,filesystem=120
> JobAcctGatherParams=UsePss
> 
> TaskPlugin=task/affinity,task/cgroup
> TaskPluginParam=autobind=threads
> 
> Is there a problem with my expectation that i should not see oom-killer?
> or with my configuration?
> 
> Thank you!
> Adrian
>