[slurm-users] Nodes remaining in drain state once job completes
Pawel R. Dziekonski
pawel.dziekonski at kaust.edu.sa
Tue Mar 19 05:50:48 UTC 2019
On 18/03/2019 23.07, Eric Rosenberg wrote:
> [2019-03-15T09:48:43.000] update_node: node rn003 reason set to: Kill task failed
This usually happens for me when one of the shared filesystems
is overloadedand processes are stuck in uninterruptible sleep
(D), thus unableto terminate.
Your reason can be different.
HTH, P
--
Dr. Pawel Dziekonski <pawel.dziekonski at kaust.edu.sa>
KAUST Advanced Computing Core Laboratory
https://www.hpc.kaust.edu.sa
More information about the slurm-users
mailing list