[slurm-users] Drain node from TaskProlog / TaskEpilog

Mark Dixon mark.c.dixon at durham.ac.uk
Mon May 24 10:02:07 UTC 2021


Hi all,

Sometimes our compute nodes get into a failed state which we can only 
detect from inside the job environment.

I can see that TaskProlog / TaskEpilog allows us to run our detection 
test; however, unlike Epilog and Prolog, they do not drain a node if they 
exit with a non-zero exit code.

Does anyone have advice on automatically draining a node in this 
situation, please?

Best wishes,

Mark



More information about the slurm-users mailing list