[slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

Bjørn-Helge Mevik b.h.mevik at usit.uio.no
Thu Oct 10 08:51:12 UTC 2019


Matthew BETTINGER <matthew.bettinger at external.total.com> writes:

> Just curious if this option or oom setting (which we use) can leave
> the nodes in CG "completing" state.

I don't think so.  As far as I know, jobs go into completing state when
Slurm is cancelling them or when they exit on their own, and stays in
that state until any epilogs are run.  In my experience, the most
typical reasons for jobs hanging in CG are disk system failures or other
failures leading to either the job processes or the epilog processes
hanging in "disk wait".

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191010/a5d3dab1/attachment.sig>


More information about the slurm-users mailing list