[slurm-users] Restart Job after sudden reboot of the node

Steven Dick kg4ydw at gmail.com
Fri Jul 24 19:51:15 UTC 2020


Both

See man sbatch,  --requeue
The default is to not requeue (unless it was changed in slurm.conf) and
your job anc check $SLURM_RESTART_COUNT to see if it has been restarted.

This is handy if your job can checkpoint / restart.

On Fri, Jul 24, 2020 at 3:33 PM Saikat Roy <saikat403 at gmail.com> wrote:

> Hello,
>
> I have recently installed SLURM in our ubuntu cluster. I have one doubt
> that if the system somehow automatically restarts due to power failure what
> will happen to the running jobs. Are they going to resume automatically or
> we have to restart manually?  If SLURM restarts automatically, is there any
> way to stop it?
> Thanks in advance
>
> with regards,
> saikat
>
>
> --
> Saikat Roy <https://saikat248.github.io/site/>
> IIT Kgp
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200724/b8320ce5/attachment.htm>


More information about the slurm-users mailing list