[slurm-users] Draining hosts because of failing jobs

Gerhard Strangar g.s at arcor.de
Tue May 4 16:09:51 UTC 2021


Hello,

how do you implement something like "drain host after 10 consecutive
failed jobs"? Unlike a host check script, that checks for known errors,
I'd like to stop killing jobs just because one node is faulty.

Gerhard



More information about the slurm-users mailing list