[slurm-users] stopping job array after N failed jobs in row

Michael DiDomenico mdidomenico4 at gmail.com
Wed Aug 2 16:01:16 UTC 2023

On Tue, Aug 1, 2023 at 3:27 PM Daniel Letai <dani at letai.org.il> wrote:
> The other OTHER approach might be to use some epilog (or possibly epilogslurmctld) to log exit codes for first 20 tasks in each array, and cancel the array if non-zero. This is a global approach which will affect all job arrays, so might not be appropriate for your use case.

you can setup task prolog/epilog.  just test for the error condition
inthe task epilog and then cancel your array if need be


i've not tried it, nor how it relates to array's but might work

More information about the slurm-users mailing list