[slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?
b.h.mevik at usit.uio.no
Tue Oct 8 11:04:28 UTC 2019
Juergen Salk <juergen.salk at uni-ulm.de> writes:
> that is interesting. We have a very similar setup as well. However, in
> our Slurm test cluster I have noticed that it is not the *job* that
> gets killed. Instead, the OOM killer terminates one (or more)
Yes, that is how the kernel OOM killer works.
This is why we always tell users to use "set -o errexit" in their job
scripts. Then at least the job script exits as soon as one of its
processes are killed.
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 832 bytes
Desc: not available
More information about the slurm-users