[slurm-users] strigger on CG, completing state
irush at cs.huji.ac.il
Wed May 29 08:01:31 UTC 2019
Check the UnkillableStepProgram and UnkillableStepTimeout options in
We use it to drain the stuck nodes and mail us - as here, usually stuck
processes will require a reboot. As the drained strigger will never get
triggered, we also set a finished trigger for the next RUNNING job. That
trigger will either send us mail if there are only stuck processes, or
strigger --fini the next RUNNING job.
On Tue, May 28, 2019 at 7:58 PM mercan <ahmet.mercan at uhem.itu.edu.tr> wrote:
> If you did not use the epilog script, you can set the epilog script to
> clean up all residues from the finished jobs:
> Ahmet M.
> 28.05.2019 19:03 tarihinde Matthew BETTINGER yazdı:
> > We use triggers for the obvious alerts but is that a way to make a
> trigger for nodes stuck in CG (completing) state? Some user jobs, mostly
> Julia notebook can get hung in completing state is the user kills the
> running job or cancels it with cntrl. When this happens we can have many
> many nodes stuck in CG. Slurm 17.02.6. Thanks!
\/ | Yair Yarom | Senior DevOps Architect
 | The Rachel and Selim Benin School
 /\ | of Computer Science and Engineering
//\\/ | The Hebrew University of Jerusalem
[// \\ | T +972-2-5494522 | F +972-2-5494522
// \ | irush at cs.huji.ac.il
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users