[slurm-users] 17.11+auks+cgroups: finished jobs hang in completing state

Robbert Eggermont R.Eggermont at tudelft.nl
Mon Mar 26 01:36:34 MDT 2018


Hi Chris,

On 26-03-18 05:04, Christopher Samuel wrote:
> Does the slurmd log report it trying to kill the auks process?

The first thing I need to do is turn up the logging verbosity.

> https://bugs.schedmd.com/show_bug.cgi?id=4733

> The fact that auks is hanging around makes me wonder if this is a
> different issue, but you never know..

It's not a 100% match but it's the closest I've found so far. I'll need 
to study this some more.

I left a test job hanging last night, and this morning the slurmstepd 
was gone, but the auks is still there (orphaned)...
Which is different than last night, when the nodes were drained because 
of a batch job failure...

I'll report back when I find out more.

Robbert

-- 
Robbert Eggermont
Intelligent Systems Support & Data Steward | TU Delft
+31 15 27 83234 | Building 28, Floor 5, Room W660
Available Mon, Wed-Fri



More information about the slurm-users mailing list