[slurm-users] job restart :: how to find the reason
Paul Edmon
pedmon at cfa.harvard.edu
Wed Dec 2 14:18:17 UTC 2020
You can dig through the slurmctld log and search for the JobID. That
should tell you what Slurm was doing at the time.
-Paul Edmon-
On 12/2/2020 6:27 AM, Adrian Sevcenco wrote:
> Hi! I encountered a situation when a bunch of jobs were restarted
> and this is seen from Requeue=1 Restarts=1 BatchFlag=1 Reboot=0
> ExitCode=0:0
>
> So, i would like to know, how i can i find why there is a Requeue
> (when there is only one partition defined) and why there is a restart ..
>
> Thanks a lot!!!
> Adrian
>
More information about the slurm-users
mailing list