[slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

Marcus Boden mboden at gwdg.de
Tue Oct 8 08:46:31 UTC 2019


Hi Jürgen,

you're looking for KillOnBadExit in the slurm.conf:
KillOnBadExit
    If set to 1, a step will be terminated immediately if any task is crashed or aborted, as indicated by a non-zero exit code. With the default value of 0, if one of the processes is crashed or aborted the other processes will continue to run while the crashed or aborted process waits. The user can override this configuration parameter by using srun's -K, --kill-on-bad-exit.

this should terminate the job if a step or a process gets oom-killed.

Best,
Marcus

On 19-10-08 10:36, Juergen Salk wrote:
> * Bjørn-Helge Mevik <b.h.mevik at usit.uio.no> [191008 08:34]:
> > Jean-mathieu CHANTREIN <jean-mathieu.chantrein at univ-angers.fr> writes:
> > 
> > > I tried using, in slurm.conf 
> > > TaskPlugin=task/affinity, task/cgroup 
> > > SelectTypeParameters=CR_CPU_Memory 
> > > MemLimitEnforce=yes 
> > >
> > > and in cgroup.conf: 
> > > CgroupAutomount=yes 
> > > ConstrainCores=yes 
> > > ConstrainRAMSpace=yes 
> > > ConstrainSwapSpace=yes 
> > > MaxSwapPercent=10 
> > > TaskAffinity=no 
> > 
> > We have a very similar setup, the biggest difference being that we have
> > MemLimitEnforce=no, and leave the killing to the kernel's cgroup.  For
> > us, jobs are killed as they should. [...] 
> 
> Hello Bjørn-Helge,
> 
> that is interesting. We have a very similar setup as well. However, in
> our Slurm test cluster I have noticed that it is not the *job* that
> gets killed. Instead, the OOM killer terminates one (or more)
> *processes* but keeps the job itself running in a potentially 
> unhealthy state.
> 
> Is there a way to tell Slurm to terminate the whole job as soon as 
> the first OOM kill event takes place during execution? 
> 
> Best regards
> Jürgen
> 
> -- 
> Jürgen Salk
> Scientific Software & Compute Services (SSCS)
> Kommunikations- und Informationszentrum (kiz)
> Universität Ulm
> Telefon: +49 (0)731 50-22478
> Telefax: +49 (0)731 50-22471
> 

-- 
Marcus Vincent Boden, M.Sc.
Arbeitsgruppe eScience
Tel.:   +49 (0)551 201-2191
E-Mail: mboden at gwdg.de
---------------------------------------
Gesellschaft fuer wissenschaftliche
Datenverarbeitung mbH Goettingen (GWDG)
Am Fassberg 11, 37077 Goettingen
URL:    http://www.gwdg.de
E-Mail: gwdg at gwdg.de
Tel.:   +49 (0)551 201-1510
Fax:    +49 (0)551 201-2150
Geschaeftsfuehrer: Prof. Dr. Ramin Yahyapour
Aufsichtsratsvorsitzender:
Prof. Dr. Christian Griesinger
Sitz der Gesellschaft: Goettingen
Registergericht: Goettingen
Handelsregister-Nr. B 598
---------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5028 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191008/2dfa9fe1/attachment.bin>


More information about the slurm-users mailing list