[slurm-users] job restart :: how to find the reason

Paul Edmon pedmon at cfa.harvard.edu
Wed Dec 2 14:18:17 UTC 2020


You can dig through the slurmctld log and search for the JobID. That 
should tell you what Slurm was doing at the time.

-Paul Edmon-

On 12/2/2020 6:27 AM, Adrian Sevcenco wrote:
> Hi! I encountered a situation when a bunch of jobs were restarted
> and this is seen from Requeue=1 Restarts=1 BatchFlag=1 Reboot=0 
> ExitCode=0:0
>
> So, i would like to know, how i can i find why there is a Requeue
> (when there is only one partition defined) and why there is a restart ..
>
> Thanks a lot!!!
> Adrian
>



More information about the slurm-users mailing list