<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 16, 2022 at 3:43 PM Sebastian Potthoff <<a href="mailto:s.potthoff@uni-muenster.de">s.potthoff@uni-muenster.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div>Hi Hermann,</div><br><blockquote type="cite"><blockquote type="cite">So you both are happily(?) ignoring this warning the "Prolog and Epilog Guide",<br>right? :-)<br><br>"Prolog and Epilog scripts [...] should not call Slurm commands (e.g. squeue,<br>scontrol, sacctmgr, etc)."<br></blockquote><br><span style="float:none;display:inline">We have probably been doing this since before the warning was added to</span><br><span style="float:none;display:inline">the documentation.  So we are "ignorantly ignoring" the advice :-/</span></blockquote><div><br></div><div>Same here :) But if $SLURM_JOB_STDOUT is not defined as documented … what can you do.</div></div></blockquote><div><br></div><div>FYI: SLURM_JOB_STDOUT among other ENV variables was added in 22.05 (see <a href="https://slurm.schedmd.com/news.html">https://slurm.schedmd.com/news.html</a>) so it might not be available if you have an older SLURM version. </div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><br></div><div><blockquote type="cite"><blockquote type="cite">May I ask how big your clusters are (number of nodes) and how heavily they are<br>used (submitted jobs per hour)?</blockquote></blockquote></div><div><br></div><div>We have around 500 nodes (mostly 2x18 cores). Jobs ending (i.e. calling the epilog script) varies quite a lot between 1000 and 15k a day, so something in between 40 and 625 Jobs/hour. During those peaks Slurm can become noticeably slower, however usually it runs fine.</div><div><br></div><div>Sebastian </div><div><br><blockquote type="cite"><div>Am 16.09.2022 um 15:15 schrieb Loris Bennett <<a href="mailto:loris.bennett@fu-berlin.de" target="_blank">loris.bennett@fu-berlin.de</a>>:</div><br><div><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">Hi Hermann,</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">Hermann Schwärzler <</span><a href="mailto:hermann.schwaerzler@uibk.ac.at" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank">hermann.schwaerzler@uibk.ac.at</a><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">> writes:</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">Hi Loris,<br>hi Sebastian,<br><br>thanks for the information on how you are doing this.<br>So you both are happily(?) ignoring this warning the "Prolog and Epilog Guide",<br>right? :-)<br><br>"Prolog and Epilog scripts [...] should not call Slurm commands (e.g. squeue,<br>scontrol, sacctmgr, etc)."<br></blockquote><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">We have probably been doing this since before the warning was added to</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">the documentation.  So we are "ignorantly ignoring" the advice :-/</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">May I ask how big your clusters are (number of nodes) and how heavily they are<br>used (submitted jobs per hour)?<br></blockquote><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">We have around 190 32-core nodes.  I don't know how I would easily find</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">out the average number of jobs per hour.  The only problems we have had</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">with submission have been when people have written their own mechanisms</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">for submitting thousands of jobs.  Once we get them to use job array,</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">such problems generally disappear.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">Cheers,</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">Loris</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">Regards,<br>Hermann<br><br>On 9/16/22 9:09 AM, Loris Bennett wrote:<br><blockquote type="cite">Hi Hermann,<br>Sebastian Potthoff <<a href="mailto:s.potthoff@uni-muenster.de" target="_blank">s.potthoff@uni-muenster.de</a>> writes:<br><br><blockquote type="cite">Hi Hermann,<br><br>I happened to read along this conversation and was just solving this issue today. I added this part to the epilog script to make it work:<br><br># Add job report to stdout<br>StdOut=$(/usr/bin/scontrol show job=$SLURM_JOB_ID | /usr/bin/grep StdOut | /usr/bin/xargs | /usr/bin/awk 'BEGIN { FS = "=" } ; { print $2 }')<br><br>NODELIST=($(/usr/bin/scontrol show hostnames))<br><br># Only add to StdOut file if it exists and if we are the first node<br>if [ "$(/usr/bin/hostname -s)" = "${NODELIST[0]}" -a ! -z "${StdOut}" ]<br>then<br>  echo "################################# JOB REPORT ##################################" >> $StdOut<br>  /usr/bin/seff $SLURM_JOB_ID >> $StdOut<br>  echo "###############################################################################" >> $StdOut<br>fi<br></blockquote>We do something similar.  At the end of our script pointed to by<br>EpilogSlurmctld we have<br>  OUT=`scontrol show jobid ${job_id} | awk -F= '/ StdOut/{print $2}'`<br>  if [ ! -f "$OUT" ]; then<br>    exit<br>  fi<br>  printf "\n== Epilog Slurmctld<br>==================================================\n\n" >>  ${OUT}<br>  seff ${SLURM_JOB_ID} >> ${OUT}<br>  printf<br>"\n======================================================================\n"<br><blockquote type="cite"><blockquote type="cite">${OUT}<br></blockquote></blockquote>  chown ${user} ${OUT}<br>Cheers,<br>Loris<br><br><blockquote type="cite">  Contrary to what it says in the slurm docs <a href="https://slurm.schedmd.com/prolog_epilog.html" target="_blank">https://slurm.schedmd.com/prolog_epilog.html</a>  I was not able to use the env var SLURM_JOB_STDOUT, so I had to fetch it via scontrol. In addition I had to<br>make sure it is only called by the „leading“ node as the epilog script will be called by ALL nodes of a multinode job and they would all call seff and clutter up the output. Last thing was to check if StdOut is<br>not of length zero (i.e. it exists). Interactive jobs would otherwise cause the node to drain.<br><br>Maybe this helps.<br><br>Kind regards<br>Sebastian<br><br>PS: goslmailer looks quite nice with its recommendations! Will definitely look into it.<br><br>--<br>Westfälische Wilhelms-Universität (WWU) Münster<br>WWU IT<br>Sebastian Potthoff (eScience / HPC)<br><br> Am 15.09.2022 um 18:07 schrieb Hermann Schwärzler <<a href="mailto:hermann.schwaerzler@uibk.ac.at" target="_blank">hermann.schwaerzler@uibk.ac.at</a>>:<br><br> Hi Ole,<br><br> On 9/15/22 5:21 PM, Ole Holm Nielsen wrote:<br><br> On 15-09-2022 16:08, Hermann Schwärzler wrote:<br><br> Just out of curiosity: how do you insert the output of seff into the out-file of a job?<br><br> Use the "smail" tool from the slurm-contribs RPM and set this in slurm.conf:<br> MailProg=/usr/bin/smail<br><br> Maybe I am missing something but from what I can tell smail sends an email and does *not* change or append to the .out file of a job...<br><br> Regards,<br> Hermann<br></blockquote><br></blockquote><br></blockquote><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">--<span> </span></span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">Dr. Loris Bennett (Herr/Mr)</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">ZEDAT, Freie Universität Berlin         Email<span> </span></span><a href="mailto:loris.bennett@fu-berlin.de" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank">loris.bennett@fu-berlin.de</a></div></blockquote></div><br></div></blockquote></div></div>