[slurm-users] retrieve the jobs restarted?

Henry Gérard gerard.henry at irstea.fr
Tue Feb 13 01:52:28 MST 2018


Hello all,

as a workaround, i finally use a Epilog script to archive the jobs
EpilogSlurmctld=/cm/local/apps/cmd/scripts/epilog-postjob
in slurm.conf
The script does:
scontrol show job -d $SLURM_JOB_ID >> $JOBS_FILE

Hth,

Gérard

Le 09/02/2018 à 17:58, Henry Gérard a écrit :
> Hello all,
> we have slurm 15.08 and we configured preemption. So sometimes, jobs are 
> killed then restarted, and the attribute "Restarts" becomes > 0:
> [gerard.henry at xcluster ~]$ scontrol show job 157945
> JobId=157945 JobName=sleep
>     UserId=gerard.henry(1016) GroupId=grp1(1002)
>     Priority=1053144 Nice=0 Account=u_recover QOS=defaultqos
>     JobState=PENDING Reason=BeginTime Dependency=(null)
>     Requeue=1 Restarts=2 BatchFlag=1 Reboot=0 ExitCode=0:0
> 
> Is there a mean to retrieve all the jobs (after they completed) where 
> Restarts is > 0?
> i found no such thing in sacct.
> I tried with -D (duplicates), but when there are thousands jobs, it's 
> unusable.
> Is this information store in the accounting database?
> 
> Thanks in advance for help,
> 

-- 
Gérard Henry
RSI Irstea Aix en Provence



More information about the slurm-users mailing list