[slurm-users] 17.11+auks+cgroups: finished jobs hang in completing state

Christopher Samuel chris at csamuel.org
Sun Mar 25 21:04:00 MDT 2018


On 26/03/18 12:43, Robbert Eggermont wrote:

> Does this sound familiar to anyone?

Does the slurmd log report it trying to kill the auks process?

Also you might want to have a look at:

https://bugs.schedmd.com/show_bug.cgi?id=4733

to see if that bug fits what you're seeing.  Basically I get a
slurmstepd stuck, deadlocking internally on free_list_lock() for
reasons that are yet to be understood.

You'll need to use pstack or gdb to see the thread info.

The fact that auks is hanging around makes me wonder if this is a
different issue, but you never know..

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



More information about the slurm-users mailing list