[slurm-users] Show detailed information from a finished job
E.M. Dragowsky
dragowsky at case.edu
Thu Apr 23 12:42:52 UTC 2020
Hi, everyone --
Our take on using epilog is likely familiar to many, but perhaps not all.
Here is an extract from epilog:
/usr/local/slurm/epilogctld:/usr/bin/scontrol show job=$SLURM_JOB_ID
--oneliner >> /usr/local/slurm/slurmrecord/$((SLURM_JOB_ID/10000)).record
The file size may be adjusted. Then these 'record' files may be
accessed/analyzed through any text extraction tool of choice. We have a
corresponding archive of job submission scripts, where we've found it more
useful to preserve each script in the user-specified format. Note the
absence of '--oneliner':
/usr/local/slurm/epilogctld:cat
/usr/local/slurm/slurmrecord/tmp-$SLURM_JOB_ID >>
/usr/local/slurm/slurmrecord/$((SLURM_JOB_ID/10000)).script
Cheers
~ E.M.
On Thu, Apr 23, 2020 at 5:46 AM mercan <ahmet.mercan at uhem.itu.edu.tr> wrote:
> Sorry, I falsely crop the "mkdir" line at below:
>
> mkdir -p $JDIR
>
> I should be after "JDIR=/okyanus/..." line
>
> Regards;
>
> Ahmet M.
>
>
> 23.04.2020 12:31 tarihinde mercan yazdı:
> > Hi;
> >
> > I prefer to use epilog script to store the job information to a top
> > directory owned by the slurm user. To avoid a directory with a lot of
> > files, It creates a sub-directory for a thousand job file. For a job
> > which its jobid is 230988, It creates a directory named as 230XXX.
> > Also the SLURM_JOB_ID of a job array is a problem, because of the
> > slurm uses an ugly format (298903_[3%1]). Because of these reasons, my
> > script is little complex, but it works (I crop the other non-relevant
> > things):
> >
> > #!/bin/bash
> >
> > if [ "x$SLURM_ARRAY_JOB_ID" != "x" ]
> > then
> > JOBNO="${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}"
> > else
> > JOBNO="${SLURM_JOB_ID}"
> > fi
> > JI=${JOBNO//_*/}
> > JWIDE=${#JI}
> > JIDLEN=0
> > $((JIDLEN=JWIDE-3))
> > JDIR=/okyanus/SLURM/log/jobs/${JI:0:$JIDLEN}XXX
> > echo
> >
> "==========================================================================="
>
> > &>>$JDIR/${JI}.txt
> > scontrol show job -dd "$JOBNO" &>>$JDIR/${JI}.txt && echo
> >
> "==========================================================================="
>
> > >>$JDIR/${JI}.txt && scontrol write batch_script "$SLURM_JOBID" -
> > >>$JDIR/${JI}.txt
> > exit 0
> >
> > Regards;
> >
> > Ahmet M.
> >
> >
> > 23.04.2020 10:33 tarihinde Gestió Servidors yazdı:
> >>
> >> Hello,
> >>
> >> When a job is “pending” or “running”, with “scontrol show
> >> jobid=#jobjumber” I can get some usefull information, but when the
> >> job has finished, that command doesn’t return anything. For example,
> >> if I run a “sacct” and I see that some jobs have finished with state
> >> “FAILED”, how can I get detailed information from that job?
> >>
> >> Thanks.
> >>
> >
>
>
--
E.M. (Em) Dragowsky, Ph.D.
Research Computing -- UTech
Case Western Reserve University
(216) 368-0082 (currently forwarding to my cell phone)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200423/9f9b927e/attachment.htm>
More information about the slurm-users
mailing list