[slurm-users] 17.11+auks+cgroups: finished jobs hang in completing state
Robbert Eggermont
R.Eggermont at tudelft.nl
Mon Mar 26 01:36:34 MDT 2018
Hi Chris,
On 26-03-18 05:04, Christopher Samuel wrote:
> Does the slurmd log report it trying to kill the auks process?
The first thing I need to do is turn up the logging verbosity.
> https://bugs.schedmd.com/show_bug.cgi?id=4733
> The fact that auks is hanging around makes me wonder if this is a
> different issue, but you never know..
It's not a 100% match but it's the closest I've found so far. I'll need
to study this some more.
I left a test job hanging last night, and this morning the slurmstepd
was gone, but the auks is still there (orphaned)...
Which is different than last night, when the nodes were drained because
of a batch job failure...
I'll report back when I find out more.
Robbert
--
Robbert Eggermont
Intelligent Systems Support & Data Steward | TU Delft
+31 15 27 83234 | Building 28, Floor 5, Room W660
Available Mon, Wed-Fri
More information about the slurm-users
mailing list