[slurm-users] Restarting jobs

Nicolas Sonoda nicolas.sonoda at versatushpc.com.br
Fri Aug 19 17:37:23 UTC 2022


Hi Paul!

Thank you very much for the explanation!
________________________________
De: slurm-users <slurm-users-bounces at lists.schedmd.com> em nome de Paul Brunk <pbrunk at uga.edu>
Enviado: sexta-feira, 19 de agosto de 2022 09:23
Para: Slurm User Community List <slurm-users at lists.schedmd.com>
Assunto: Re: [slurm-users] Restarting jobs


Hi Nicolas!



In Slurm lingo this is "job requeueing".  The JobRequeue

slurm.conf parameter controls whether Slurm tries to start those

jobs again (requeue vs. job exit).



The slurm.conf doc puts it nicely:



This option controls the default ability for batch jobs to be

requeued. Jobs may be requeued explicitly by a system

administrator, after node failure, or upon preemption by a

higher priority job. If JobRequeue is set to a value of 1, then

batch jobs may be requeued unless explicitly disabled by the

user. If JobRequeue is set to a value of 0, then batch jobs will

not be requeued unless explicitly enabled by the user. Use the

sbatch --no-requeue or --requeue option to change the default

behavior for individual jobs. The default value is 1.



--

Paul Brunk, system administrator

Advanced Computing Resource Center

Enterprise IT Svcs, the University of Georgia





On 8/18/22, 1:57 PM, "slurm-users" <slurm-users-bounces at lists.schedmd.com> wrote:

Hi!



In this week, my machines rebooted and the jobs that was running restarted and I've lost the progress that it made. So, can I prevent that restart of jobs? For example if my machines reboot the jobs get cancelled.





Thanks you.

Nícolas


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220819/058f4178/attachment.htm>


More information about the slurm-users mailing list