[slurm-users] strigger on CG, completing state
Yair Yarom
irush at cs.huji.ac.il
Wed May 29 08:01:31 UTC 2019
Hi,
Check the UnkillableStepProgram and UnkillableStepTimeout options in
slurm.conf.
We use it to drain the stuck nodes and mail us - as here, usually stuck
processes will require a reboot. As the drained strigger will never get
triggered, we also set a finished trigger for the next RUNNING job. That
trigger will either send us mail if there are only stuck processes, or
strigger --fini the next RUNNING job.
Yair.
On Tue, May 28, 2019 at 7:58 PM mercan <ahmet.mercan at uhem.itu.edu.tr> wrote:
> Hi;
>
> If you did not use the epilog script, you can set the epilog script to
> clean up all residues from the finished jobs:
>
>
> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-prolog-and-epilog-scripts
>
> Ahmet M.
>
>
> 28.05.2019 19:03 tarihinde Matthew BETTINGER yazdı:
> > We use triggers for the obvious alerts but is that a way to make a
> trigger for nodes stuck in CG (completing) state? Some user jobs, mostly
> Julia notebook can get hung in completing state is the user kills the
> running job or cancels it with cntrl. When this happens we can have many
> many nodes stuck in CG. Slurm 17.02.6. Thanks!
> >
>
>
--
/| |
\/ | Yair Yarom | Senior DevOps Architect
[] | The Rachel and Selim Benin School
[] /\ | of Computer Science and Engineering
[]//\\/ | The Hebrew University of Jerusalem
[// \\ | T +972-2-5494522 | F +972-2-5494522
// \ | irush at cs.huji.ac.il
// |
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190529/4b01650f/attachment.html>
More information about the slurm-users
mailing list