[slurm-users] OverMemoryKill Not Working?

mercan ahmet.mercan at uhem.itu.edu.tr
Fri Oct 25 03:27:29 UTC 2019


Hi;

You should set

SelectType=select/cons_res

and plus one of these:

SelectTypeParameters=CR_Memory
SelectTypeParameters=CR_Core_Memory
SelectTypeParameters=CR_CPU_Memory
SelectTypeParameters=CR_Socket_Memory

to open Memory allocation tracking according to documentation:

https://slurm.schedmd.com/cons_res_share.html

Also, the line:

#SBATCH --mem=1GBB

contains "1GBB". Is this same at job script?


Regards;

Ahmet M.


24.10.2019 23:00 tarihinde Mike Mosley yazdı:
> Hello,
>
> We are testing Slurm19.05 on Linux RHEL7.5+ with the intent to migrate 
> from it toTorque/Moab in the near future.
>
> One of the things our users are used to is that when their jobs exceed 
> the amount of memory they requested, the job is terminated by the 
> scheduler.   We realize the Slurm prefers to use cgroups to contain 
> rather than kill the jobs but initially we need to have the kill 
> option in place to transition our users.
>
> So, looking at the documentation, it appears that in 19.05, the 
> following needs to be set to accomplish this:
>
> JobAcctGatherParams = OverMemoryKill
>
>
> Other possibly relevant settings we made:
>
> JobAcctGatherType = jobacct_gather/linux
>
> ProctrackType = proctrack/linuxproc
>
>
> We have avoided configuring any cgroup parameters for the time being.
>
> Unfortunately, when we submit a job with the following:
>
> #SBATCH --nodes=1
>
> #SBATCH --ntasks-per-node=1
>
> #SBATCH --mem=1GBB
>
>
> We see RSS ofthe  job steadily increase beyond the 1GB limit and it is 
> never killed.    Interestingly enough, the proc information shows the 
> ulimit (hard and soft) for the process set to around 1GB.
>
> We have tried various settings without any success.   Can anyone point 
> out what we are doing wrong?
>
> Thanks,
>
> Mike
>
> -- 
> */J. Michael Mosley/*
> University Research Computing
> The University of North Carolina at Charlotte
> 9201 University City Blvd
> Charlotte, NC  28223
> _704.687.7065 _ _ j/mmosley at uncc.edu <mailto:mmosley at uncc.edu>/_



More information about the slurm-users mailing list