[slurm-users] Restart Job after sudden reboot of the node
Steven Dick
kg4ydw at gmail.com
Fri Jul 24 19:51:15 UTC 2020
Both
See man sbatch, --requeue
The default is to not requeue (unless it was changed in slurm.conf) and
your job anc check $SLURM_RESTART_COUNT to see if it has been restarted.
This is handy if your job can checkpoint / restart.
On Fri, Jul 24, 2020 at 3:33 PM Saikat Roy <saikat403 at gmail.com> wrote:
> Hello,
>
> I have recently installed SLURM in our ubuntu cluster. I have one doubt
> that if the system somehow automatically restarts due to power failure what
> will happen to the running jobs. Are they going to resume automatically or
> we have to restart manually? If SLURM restarts automatically, is there any
> way to stop it?
> Thanks in advance
>
> with regards,
> saikat
>
>
> --
> Saikat Roy <https://saikat248.github.io/site/>
> IIT Kgp
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200724/b8320ce5/attachment.htm>
More information about the slurm-users
mailing list