[slurm-users] OverMemoryKill Not Working?

Mike Mosley Mike.Mosley at uncc.edu
Fri Oct 25 13:17:54 UTC 2019


Ahmet,

Thank you for taking the time to respond to my question.

Yes, the --mem=1GBB is a typo.   It's correct in my script, I just
fat-fingered it in the email. :-)

BTW, the exact version I am using is 19.05.*2.*

Regarding your response, it seems that that might be more than what I
need.   I simply want to enforce the memory limits as specified by the user
at job submission time.   This seems to have been the behavior in previous
versions of Slurm.   What I want is what is described in the 19.05 release
notes:



*RELEASE NOTES FOR SLURM VERSION 19.0528 May 2019*














*NOTE: slurmd and slurmctld will now fatal if two incompatible mechanisms
for      enforcing memory limits are set. This makes incompatible the use
of      task/cgroup memory limit enforcing (Constrain[RAM|Swap]Space=yes)
with      JobAcctGatherParams=OverMemoryKill, which could cause problems
when a      task is killed by one of them while the other is at the same
time      managing that task. The NoOverMemoryKill setting has been
deprecated in      favor of OverMemoryKill, since now the default is *NOT*
to have any      memory enforcement mechanism.NOTE: MemLimitEnforce
parameter has been removed and the functionality that      was provided
with it has been merged into a JobAcctGatherParams. It      may be enabled
by setting JobAcctGatherParams=OverMemoryKill, so now      job and steps
killing by OOM is enabled from the same place.*



So, is it really necessary to do what you suggested to get that
functionality?

If someone could post just a simple slurm.conf file that forces the memory
limits to be honored (and kills the job if they are exceeded), then I could
extract what I need from that.

Again, thanks for the assistance.

Mike



On Thu, Oct 24, 2019 at 11:27 PM mercan <ahmet.mercan at uhem.itu.edu.tr>
wrote:

> Hi;
>
> You should set
>
> SelectType=select/cons_res
>
> and plus one of these:
>
> SelectTypeParameters=CR_Memory
> SelectTypeParameters=CR_Core_Memory
> SelectTypeParameters=CR_CPU_Memory
> SelectTypeParameters=CR_Socket_Memory
>
> to open Memory allocation tracking according to documentation:
>
> https://slurm.schedmd.com/cons_res_share.html
>
> Also, the line:
>
> #SBATCH --mem=1GBB
>
> contains "1GBB". Is this same at job script?
>
>
> Regards;
>
> Ahmet M.
>
>
> 24.10.2019 23:00 tarihinde Mike Mosley yazdı:
> > Hello,
> >
> > We are testing Slurm19.05 on Linux RHEL7.5+ with the intent to migrate
> > from it toTorque/Moab in the near future.
> >
> > One of the things our users are used to is that when their jobs exceed
> > the amount of memory they requested, the job is terminated by the
> > scheduler.   We realize the Slurm prefers to use cgroups to contain
> > rather than kill the jobs but initially we need to have the kill
> > option in place to transition our users.
> >
> > So, looking at the documentation, it appears that in 19.05, the
> > following needs to be set to accomplish this:
> >
> > JobAcctGatherParams = OverMemoryKill
> >
> >
> > Other possibly relevant settings we made:
> >
> > JobAcctGatherType = jobacct_gather/linux
> >
> > ProctrackType = proctrack/linuxproc
> >
> >
> > We have avoided configuring any cgroup parameters for the time being.
> >
> > Unfortunately, when we submit a job with the following:
> >
> > #SBATCH --nodes=1
> >
> > #SBATCH --ntasks-per-node=1
> >
> > #SBATCH --mem=1GBB
> >
> >
> > We see RSS ofthe  job steadily increase beyond the 1GB limit and it is
> > never killed.    Interestingly enough, the proc information shows the
> > ulimit (hard and soft) for the process set to around 1GB.
> >
> > We have tried various settings without any success.   Can anyone point
> > out what we are doing wrong?
> >
> > Thanks,
> >
> > Mike
> >
> > --
> > */J. Michael Mosley/*
> > University Research Computing
> > The University of North Carolina at Charlotte
> > 9201 University City Blvd
> > Charlotte, NC  28223
> > _704.687.7065 _ _ j/mmosley at uncc.edu <mailto:mmosley at uncc.edu>/_
>


-- 
*J. Michael Mosley*
University Research Computing
The University of North Carolina at Charlotte
9201 University City Blvd
Charlotte, NC  28223
*704.687.7065 *    * jmmosley at uncc.edu <mmosley at uncc.edu>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191025/f2d387af/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5329 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191025/f2d387af/attachment-0003.bin>


More information about the slurm-users mailing list