[slurm-users] Nodes remaining in drain state once job completes

Pawel R. Dziekonski pawel.dziekonski at kaust.edu.sa
Tue Mar 19 05:50:48 UTC 2019


On 18/03/2019 23.07, Eric Rosenberg wrote:
>  [2019-03-15T09:48:43.000] update_node: node rn003 reason set to: Kill task failed

This usually happens for me when one of the shared filesystems
is overloadedand processes are stuck in uninterruptible sleep
(D), thus unableto terminate.

Your reason can be different.

HTH, P

-- 
Dr. Pawel Dziekonski <pawel.dziekonski at kaust.edu.sa>
KAUST Advanced Computing Core Laboratory
https://www.hpc.kaust.edu.sa




More information about the slurm-users mailing list