[slurm-users] Jobs killed by OOM-killer only on certain nodes.

Chris Samuel chris at csamuel.org
Thu Jul 2 20:23:48 UTC 2020

On Thursday, 2 July 2020 6:52:15 AM PDT Prentice Bisbal wrote:

> [2020-07-01T16:19:19.463] [801777.extern] _oom_event_monitor: oom-kill
> event count: 1

We get that line for pretty much every job, I don't think it reflects the OOM 
killer being invoked on something in the extern step.

OOM killer invocations should be recorded in the kernel logs on the node, 
check with "dmesg -T" to see if it's being invoked (or whether they are 
getting logged to via syslog if they've got dropped from the ring buffer due to 
later messages).

All the best,
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

More information about the slurm-users mailing list