[slurm-users] 17.11+auks+cgroups: finished jobs hang in completing state
Christopher Samuel
chris at csamuel.org
Sun Mar 25 21:04:00 MDT 2018
On 26/03/18 12:43, Robbert Eggermont wrote:
> Does this sound familiar to anyone?
Does the slurmd log report it trying to kill the auks process?
Also you might want to have a look at:
https://bugs.schedmd.com/show_bug.cgi?id=4733
to see if that bug fits what you're seeing. Basically I get a
slurmstepd stuck, deadlocking internally on free_list_lock() for
reasons that are yet to be understood.
You'll need to use pstack or gdb to see the thread info.
The fact that auks is hanging around makes me wonder if this is a
different issue, but you never know..
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
More information about the slurm-users
mailing list