[slurm-users] Jobs killed by OOM-killer only on certain nodes.
    Chris Samuel 
    chris at csamuel.org
       
    Thu Jul  2 20:23:48 UTC 2020
    
    
  
On Thursday, 2 July 2020 6:52:15 AM PDT Prentice Bisbal wrote:
> [2020-07-01T16:19:19.463] [801777.extern] _oom_event_monitor: oom-kill
> event count: 1
We get that line for pretty much every job, I don't think it reflects the OOM 
killer being invoked on something in the extern step.
OOM killer invocations should be recorded in the kernel logs on the node, 
check with "dmesg -T" to see if it's being invoked (or whether they are 
getting logged to via syslog if they've got dropped from the ring buffer due to 
later messages).
All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
    
    
More information about the slurm-users
mailing list