[slurm-users] draining nodes due to failed killing of task?

Fri Aug 6 10:27:20 UTC 2021

Hi.

Might it be due to a timeout (maybe the killed job is creating a core 
file, or caused heavy swap usage)?

BYtE,
  Diego

Il 06/08/2021 09:02, Adrian Sevcenco ha scritto:
> Having just implemented some triggers i just noticed this:
> 
> NODELIST    NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK 
> WEIGHT AVAIL_FE REASON
> alien-0-47      1    alien*    draining   48   48:1:1 193324   
> 214030      1 rack-0,4 Kill task failed
> alien-0-56      1    alien*     drained   48   48:1:1 193324   
> 214030      1 rack-0,4 Kill task failed
> 
> i was wondering why a node is drained when killing of task fails and how 
> can i disable it? (i use cgroups)
> moreover, how can the killing of task fails? (this is on slurm 19.05)
> 
> Thank you!
> Adrian
> 
> 

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786