[slurm-users] memory limits:: why job is not killed but oom-killer steps up?

Thu Jan 13 10:50:08 UTC 2022

On 13.01.2022 10:59, Hermann Schwärzler wrote:
> Hi Adrian,
Hi!

> ConstrainRAMSpace=yes
> 
> has the effect that when the memory the job requested is exhausted the processes of the job will start paging/swapping.
> 
> If you want to stop jobs that use more memory (RSS to be precise) than they reqeusted, you have to add this to your 
> cgroup.conf:
> 
> ConstrainSwapSpace=yes
> AllowedSwapSpace=0
ooh, thanks a lot!!! now i see that only AllowedSwapSpace have the comment:
"If the limit is exceeded, the job steps will be killed"

Thanks a lot!!
Adrian

> 
> Regards,
> Hermann
> 
> On 1/12/22 11:04 PM, Adrian Sevcenco wrote:
>>
>> Hi! I have a problem with the enforcing the memory limits...
>> I'm using the cgroup to enforce the limits and i had expected that when
>> cgroup memory limits are reach the job is killed ..
>> instead i see in log a lot of oom-killer reports that act only a certain process
>> from cgroup ...
>>
>> Did i missed anything in my configuration? I have the following:
>>
>> SelectType=select/cons_res
>> SelectTypeParameters=CR_CPU_MEMORY,CR_LLN
>>
>> the partition have:
>> DefMemPerCPU=3950 MaxMemPerCPU=4010  (i understood that these are MiB, and physically i have 4GiB/thread)
>>
>> cat cgroup.conf
>> CgroupAutomount=yes
>> TaskAffinity=no
>> ConstrainCores=yes
>> ConstrainRAMSpace=yes
>>
>> ProctrackType=proctrack/cgroup
>>
>> JobAcctGatherType=jobacct_gather/linux
>> JobAcctGatherFrequency=task=15,filesystem=120
>> JobAcctGatherParams=UsePss
>>
>> TaskPlugin=task/affinity,task/cgroup
>> TaskPluginParam=autobind=threads
>>
>> Is there a problem with my expectation that i should not see oom-killer?
>> or with my configuration?
>>
>> Thank you!
>> Adrian
>>
> 

-- 
----------------------------------------------
Adrian Sevcenco, Ph.D.                       |
Institute of Space Science - ISS, Romania    |
adrian.sevcenco at {cern.ch,spacescience.ro} |
----------------------------------------------