[slurm-users] OverMemoryKill Not Working?

Mike Mosley Mike.Mosley at uncc.edu
Fri Oct 25 15:29:15 UTC 2019


Mark,

Thanks for responding.

Yes, it will constrain it to the amount of memory the user asked for.   In
fact I have gotten that to work.

That is not the behavior that we desire (at least initially).  The test
code I ran through (which just allocates chunks of RAM in a loop)
would be *constrained
*to the amount of RAM I asked for but it would not die because it
exceeded.   (Which make sense because it is being contrained).  It would
continue to run and I would observe the RSS size grow  to 100%  of what the
allocation I requested, then decrease to about 75% of the allocation,
increase back to 100% then decrease again etc. etc.

Our users have to continually try various memory allocations for the jobs
they run and they would prefer that they job die when it runs out of
memory, rather than just hang.

Also, the Slurm Release notes for 19.05 say that cgroup and
NoOverMemoryKill  are not compatible:

https://slurm.schedmd.com/news.html

NOTE: slurmd and slurmctld will now fatal if two incompatible mechanisms for
      enforcing memory limits are set. This makes incompatible the use of
      task/cgroup memory limit enforcing (Constrain[RAM|Swap]Space=yes) with
      JobAcctGatherParams=OverMemoryKill, which could cause problems when a
      task is killed by one of them while the other is at the same time
      managing that task. The NoOverMemoryKill setting has been deprecated
in
      favor of OverMemoryKill, since now the default is *NOT* to have any
      memory enforcement mechanism.

Have you gotten this to work with 19.0.5?

Thanks Mike

On Fri, Oct 25, 2019 at 9:41 AM Mark Hahn <hahn at mcmaster.ca> wrote:

> > need.   I simply want to enforce the memory limits as specified by the
> user
> > at job submission time.   This seems to have been the behavior in
> previous
>
> but cgroups (with Constrain) do that all by themselves.
>
> > If someone could post just a simple slurm.conf file that forces the
> memory
> > limits to be honored (and kills the job if they are exceeded), then I
> could
> > extract what I need from that.
>
> slurm.conf:
> TaskPlugin=task/cgroup
> JobAcctGatherParams=NoOverMemoryKill
>
> cgroup.conf:
> ConstrainRAMSpace=yes
> ConstrainKmemSpace=no
> ConstrainSwapSpace=yes
> AllowedRamSpace=100
> AllowedSwapSpace=0
> MaxRAMPercent=100
> MaxSwapPercent=100
> MinRAMSpace=30
>
> I think those are the relevant portions.
>
> regards,
> --
> operator may differ from spokesperson.              hahn at mcmaster.ca
>
>

-- 
*J. Michael Mosley*
University Research Computing
The University of North Carolina at Charlotte
9201 University City Blvd
Charlotte, NC  28223
*704.687.7065 *    * jmmosley at uncc.edu <mmosley at uncc.edu>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191025/219dbf81/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5329 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191025/219dbf81/attachment-0003.bin>


More information about the slurm-users mailing list