[slurm-users] Drain node from TaskProlog / TaskEpilog
Mark Dixon
mark.c.dixon at durham.ac.uk
Mon May 24 10:02:07 UTC 2021
Hi all,
Sometimes our compute nodes get into a failed state which we can only
detect from inside the job environment.
I can see that TaskProlog / TaskEpilog allows us to run our detection
test; however, unlike Epilog and Prolog, they do not drain a node if they
exit with a non-zero exit code.
Does anyone have advice on automatically draining a node in this
situation, please?
Best wishes,
Mark
More information about the slurm-users
mailing list