[slurm-users] retrieve the jobs restarted?
Henry Gérard
gerard.henry at irstea.fr
Fri Feb 9 09:58:23 MST 2018
Hello all,
we have slurm 15.08 and we configured preemption. So sometimes, jobs are
killed then restarted, and the attribute "Restarts" becomes > 0:
[gerard.henry at xcluster ~]$ scontrol show job 157945
JobId=157945 JobName=sleep
UserId=gerard.henry(1016) GroupId=grp1(1002)
Priority=1053144 Nice=0 Account=u_recover QOS=defaultqos
JobState=PENDING Reason=BeginTime Dependency=(null)
Requeue=1 Restarts=2 BatchFlag=1 Reboot=0 ExitCode=0:0
Is there a mean to retrieve all the jobs (after they completed) where
Restarts is > 0?
i found no such thing in sacct.
I tried with -D (duplicates), but when there are thousands jobs, it's
unusable.
Is this information store in the accounting database?
Thanks in advance for help,
--
Gérard Henry
More information about the slurm-users
mailing list