[slurm-users] Providing users with info on wait time vs. run time

Loris Bennett loris.bennett at fu-berlin.de
Fri Sep 16 11:50:37 UTC 2022


Hi Sebastian,

Sebastian Potthoff <s.potthoff at uni-muenster.de> writes:

> Hi Loris
>
>  We do something similar.  At the end of our script pointed to by
>  EpilogSlurmctld we have
>
> Using EpilogSlurmctld only works if the slurmctld user is root (or slurm with root privileges), right? I opted for the normal Epilog since we wanted to avoid running slurm as root and I don’t have to worry
> about ownership of the output file.

Yes, good point.  We should look into that.

Cheers,

Loris


> Sebastian
>
>  Am 16.09.2022 um 09:09 schrieb Loris Bennett <loris.bennett at fu-berlin.de>:
>
>  Hi Hermann,
>
>  Sebastian Potthoff <s.potthoff at uni-muenster.de> writes:
>
>  Hi Hermann,
>
>  I happened to read along this conversation and was just solving this issue today. I added this part to the epilog script to make it work:
>
>  # Add job report to stdout
>  StdOut=$(/usr/bin/scontrol show job=$SLURM_JOB_ID | /usr/bin/grep StdOut | /usr/bin/xargs | /usr/bin/awk 'BEGIN { FS = "=" } ; { print $2 }')
>
>  NODELIST=($(/usr/bin/scontrol show hostnames))
>
>  # Only add to StdOut file if it exists and if we are the first node
>  if [ "$(/usr/bin/hostname -s)" = "${NODELIST[0]}" -a ! -z "${StdOut}" ]
>  then
>   echo "################################# JOB REPORT ##################################" >> $StdOut
>   /usr/bin/seff $SLURM_JOB_ID >> $StdOut
>   echo "###############################################################################" >> $StdOut
>  fi
>
>  We do something similar.  At the end of our script pointed to by
>  EpilogSlurmctld we have
>
>   OUT=`scontrol show jobid ${job_id} | awk -F= '/ StdOut/{print $2}'`
>   if [ ! -f "$OUT" ]; then
>     exit
>   fi
>
>   printf "\n== Epilog Slurmctld ==================================================\n\n" >>  ${OUT}
>
>   seff ${SLURM_JOB_ID} >> ${OUT}
>
>   printf "\n======================================================================\n" >>  ${OUT}
>
>   chown ${user} ${OUT}
>
>  Cheers,
>
>  Loris
>
>   Contrary to what it says in the slurm docs https://slurm.schedmd.com/prolog_epilog.html  I was not able to use the env var SLURM_JOB_STDOUT, so I had to fetch it via scontrol. In addition I
>  had to
>  make sure it is only called by the „leading“ node as the epilog script will be called by ALL nodes of a multinode job and they would all call seff and clutter up the output. Last thing was to check if
>  StdOut is
>  not of length zero (i.e. it exists). Interactive jobs would otherwise cause the node to drain.
>
>  Maybe this helps. 
>
>  Kind regards
>  Sebastian
>
>  PS: goslmailer looks quite nice with its recommendations! Will definitely look into it.
>
>  --
>  Westfälische Wilhelms-Universität (WWU) Münster
>  WWU IT
>  Sebastian Potthoff (eScience / HPC)
>
>  Am 15.09.2022 um 18:07 schrieb Hermann Schwärzler <hermann.schwaerzler at uibk.ac.at>:
>
>  Hi Ole,
>
>  On 9/15/22 5:21 PM, Ole Holm Nielsen wrote:
>
>  On 15-09-2022 16:08, Hermann Schwärzler wrote:
>
>  Just out of curiosity: how do you insert the output of seff into the out-file of a job?
>
>  Use the "smail" tool from the slurm-contribs RPM and set this in slurm.conf:
>  MailProg=/usr/bin/smail
>
>  Maybe I am missing something but from what I can tell smail sends an email and does *not* change or append to the .out file of a job...
>
>  Regards,
>  Hermann
>
-- 
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de



More information about the slurm-users mailing list