[slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

Bjørn-Helge Mevik b.h.mevik at usit.uio.no
Tue Oct 8 11:04:28 UTC 2019


Juergen Salk <juergen.salk at uni-ulm.de> writes:

> that is interesting. We have a very similar setup as well. However, in
> our Slurm test cluster I have noticed that it is not the *job* that
> gets killed. Instead, the OOM killer terminates one (or more)
> *processes*

Yes, that is how the kernel OOM killer works.

This is why we always tell users to use "set -o errexit" in their job
scripts.  Then at least the job script exits as soon as one of its
processes are killed.

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191008/cafb6b14/attachment.sig>


More information about the slurm-users mailing list