Dear slurm-user list,
I had cases where our resumeProgram failed due to temporary cloud timeouts. In that case the resumeProgram returns a value =/= 0. Why does Slurm still wait until resumeTimeout instead of just accepting the startup as failed which then should lead to a rescheduling of the job.
Is there some way to achieve the described effect i.e. tell Slurm: "You can stop waiting, the node won't come alive." or am I missing the correct way how this should be handled in Slurm?
Best regards, Xaver