[slurm-users] Providing users with info on wait time vs. run time

Hermann Schwärzler hermann.schwaerzler at uibk.ac.at
Fri Sep 16 12:07:13 UTC 2022


Hi Loris,
hi Sebastian,

thanks for the information on how you are doing this.
So you both are happily(?) ignoring this warning the "Prolog and Epilog 
Guide", right? :-)

"Prolog and Epilog scripts [...] should not call Slurm commands (e.g. 
squeue, scontrol, sacctmgr, etc)."

May I ask how big your clusters are (number of nodes) and how heavily 
they are used (submitted jobs per hour)?

Regards,
Hermann

On 9/16/22 9:09 AM, Loris Bennett wrote:
> Hi Hermann,
> 
> Sebastian Potthoff <s.potthoff at uni-muenster.de> writes:
> 
>> Hi Hermann,
>>
>> I happened to read along this conversation and was just solving this issue today. I added this part to the epilog script to make it work:
>>
>> # Add job report to stdout
>> StdOut=$(/usr/bin/scontrol show job=$SLURM_JOB_ID | /usr/bin/grep StdOut | /usr/bin/xargs | /usr/bin/awk 'BEGIN { FS = "=" } ; { print $2 }')
>>
>> NODELIST=($(/usr/bin/scontrol show hostnames))
>>
>> # Only add to StdOut file if it exists and if we are the first node
>> if [ "$(/usr/bin/hostname -s)" = "${NODELIST[0]}" -a ! -z "${StdOut}" ]
>> then
>>    echo "################################# JOB REPORT ##################################" >> $StdOut
>>    /usr/bin/seff $SLURM_JOB_ID >> $StdOut
>>    echo "###############################################################################" >> $StdOut
>> fi
> 
> We do something similar.  At the end of our script pointed to by
> EpilogSlurmctld we have
> 
>    OUT=`scontrol show jobid ${job_id} | awk -F= '/ StdOut/{print $2}'`
>    if [ ! -f "$OUT" ]; then
>      exit
>    fi
> 
>    printf "\n== Epilog Slurmctld ==================================================\n\n" >>  ${OUT}
> 
>    seff ${SLURM_JOB_ID} >> ${OUT}
> 
>    printf "\n======================================================================\n" >>  ${OUT}
> 
>    chown ${user} ${OUT}
> 
> Cheers,
> 
> Loris
> 
>>    Contrary to what it says in the slurm docs https://slurm.schedmd.com/prolog_epilog.html  I was not able to use the env var SLURM_JOB_STDOUT, so I had to fetch it via scontrol. In addition I had to
>> make sure it is only called by the „leading“ node as the epilog script will be called by ALL nodes of a multinode job and they would all call seff and clutter up the output. Last thing was to check if StdOut is
>> not of length zero (i.e. it exists). Interactive jobs would otherwise cause the node to drain.
>>
>> Maybe this helps.
>>
>> Kind regards
>> Sebastian
>>
>> PS: goslmailer looks quite nice with its recommendations! Will definitely look into it.
>>
>> --
>> Westfälische Wilhelms-Universität (WWU) Münster
>> WWU IT
>> Sebastian Potthoff (eScience / HPC)
>>
>>   Am 15.09.2022 um 18:07 schrieb Hermann Schwärzler <hermann.schwaerzler at uibk.ac.at>:
>>
>>   Hi Ole,
>>
>>   On 9/15/22 5:21 PM, Ole Holm Nielsen wrote:
>>
>>   On 15-09-2022 16:08, Hermann Schwärzler wrote:
>>
>>   Just out of curiosity: how do you insert the output of seff into the out-file of a job?
>>
>>   Use the "smail" tool from the slurm-contribs RPM and set this in slurm.conf:
>>   MailProg=/usr/bin/smail
>>
>>   Maybe I am missing something but from what I can tell smail sends an email and does *not* change or append to the .out file of a job...
>>
>>   Regards,
>>   Hermann
> 



More information about the slurm-users mailing list