<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none"><!--P{margin-top:0;margin-bottom:0;} p
{margin-top:0;
margin-bottom:0}--></style>
</head>
<body dir="ltr" style="font-size:12pt;color:#000000;background-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;">
<p>A confirmation re-run yielded the same outcome but the correct outcome was available using
</p>
<p><span style="font-size: 10pt; font-family: "Courier New", monospace;"><span style="font-family: "Courier New", monospace;">$ scontrol show job 197</span><br style="font-family: "Courier New", monospace;">
</span></p>
<p><span style="font-size: 10pt; font-family: "Courier New", monospace;"><span style="font-family: "Courier New", monospace;"> JobState=FAILED Reason=NonZeroExitCode Dependency=(null)</span></span><br style="font-size: 10pt; font-family: "Courier New", monospace;">
<span style="font-size: 10pt; font-family: "Courier New", monospace;"><span style="font-family: "Courier New", monospace;"> Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=141:0</span></span><br>
</p>
<p><br>
</p>
<p>sacct still reports as before</p>
<p><span style="font-size: 10pt; font-family: "Courier New", monospace;"><span style="font-family: "Courier New", monospace;">$ sacct -j 197</span></span><br style="font-size: 10pt; font-family: "Courier New", monospace;">
<span style="font-size: 10pt; font-family: "Courier New", monospace;"><span style="font-family: "Courier New", monospace;"> JobID JobName Partition Account AllocCPUS State ExitCode
</span></span><br style="font-size: 10pt; font-family: "Courier New", monospace;">
<span style="font-size: 10pt; font-family: "Courier New", monospace;"><span style="font-family: "Courier New", monospace;">------------ ---------- ---------- ---------- ---------- ---------- --------
</span></span><br style="font-size: 10pt; font-family: "Courier New", monospace;">
<span style="font-size: 10pt; font-family: "Courier New", monospace;"><span style="font-family: "Courier New", monospace;">197 T_113491_+ all_slt_l+ slt 1 FAILED 13:0
</span></span><br style="font-size: 10pt; font-family: "Courier New", monospace;">
<span style="font-size: 10pt; font-family: "Courier New", monospace;"><span style="font-family: "Courier New", monospace;">197.batch batch slt 1 FAILED 13:0</span></span><br>
</p>
<p><br>
</p>
<p>Matt<br>
</p>
<div dir="ltr" style="font-size:12pt; color:#000000; background-color:#FFFFFF; font-family:Calibri,Arial,Helvetica,sans-serif">
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Matthew Goulden<br>
<b>Sent:</b> Friday, November 23, 2018 11:21 AM<br>
<b>To:</b> slurm-users@lists.schedmd.com<br>
<b>Subject:</b> new user; ExitCode reporting</font>
<div> </div>
</div>
<div>
<p>Hi All,</p>
<p><br>
</p>
<p>New using migrating from uge/sge, I'm baffled by the ExitCode recording into slurmdb; not sure if this is 'new user' issue or bug, so exposing it here first.<br>
</p>
<p><br>
</p>
<p>Running simple sbatch scripts with these headers relevant </p>
<p>#!/bin/bash<br>
</p>
<p>#SBATCH --mail-user <me>@<work><br>
#SBATCH --mail-type END</p>
<p>#SBATCH -J T_113491_<redacted>_20150522<br>
</p>
<p><br>
</p>
<p>The sbatch calls various tools, and terminally a 'completion_reporter' bash script reporting whether all calls have proceeded to completion.</p>
<p>If not the return_code from that script is passed into the sbatch script as an exit command; the expectation is that the return code for the sbatch script in these circumstances is that from the completion_reporter'. That return_code is 141</p>
<p><br>
</p>
<p>GOOD<br>
</p>
<p>The emails received have subject lin<span class="rpHighlightAllClass rpHighlightSubjectClass" tabindex="-1" style="">e consistent with expectations</span><br>
</p>
<p>'<span class="rpHighlightAllClass rpHighlightSubjectClass" tabindex="-1" style="">Slurm Job_id=196 Name=T_113491_<redacted>_20150522 Ended, Run time 00:00:24, FAILED, ExitCode 141'<br>
</span></p>
<p><span class="rpHighlightAllClass rpHighlightSubjectClass" tabindex="-1" style=""><br>
</span></p>
<p><span class="rpHighlightAllClass rpHighlightSubjectClass" tabindex="-1" style="">UNEXPECTED<br>
</span></p>
<p><span class="rpHighlightAllClass rpHighlightSubjectClass" tabindex="-1" style="">However sacct output is not c<span class="rpHighlightAllClass rpHighlightSubjectClass" tabindex="-1" style="">onsistent with expectations</span>...<br>
</span></p>
<p><span class="rpHighlightAllClass rpHighlightSubjectClass" tabindex="-1" style=""></span>$ sacct -j 196
</p>
<p><span style="font-family:"Courier New",monospace; font-size:10pt">------------ ---------- ---------- ---------- ---------- ---------- --------
</span><br style="font-family:"Courier New",monospace; font-size:10pt">
<span style="font-family:"Courier New",monospace; font-size:10pt">196 T_113491_+ all_slt_l+ slt 1 FAILED 13:0
</span><br style="font-family:"Courier New",monospace; font-size:10pt">
<span style="font-family:"Courier New",monospace; font-size:10pt">196.batch batch slt 1 FAILED 13:0
</span><br>
</p>
<p><br>
</p>
<p><br>
</p>
<p>I've spent some time reading through the (excellent, frankly) documentation for sbatch and job_exit_code and while learning a great deal nothing has explained with anomaly.</p>
<p><br>
</p>
<p>Incidentally I expected to be able to use scontrol as below; any pointers on the unexpected outcome would be welcome<br>
</p>
<p><span style="font-size:10pt; font-family:"Courier New",monospace"><span style="font-family:"Courier New",monospace">$ scontrol show step 196.batch</span></span><br style="font-size:10pt; font-family:"Courier New",monospace">
<span style="font-size:10pt; font-family:"Courier New",monospace"><span style="font-family:"Courier New",monospace">Job step 196.0 not found</span></span><br>
<br>
</p>
<p><br>
</p>
<p>We have put a fair bit of work into informatively coding our fail exit_codes so suggestions as to what's going on here would be welcome.</p>
<p><br>
</p>
<p>Thanks in advance<br>
</p>
<p><br>
</p>
<p>Matt</p>
<p><br>
</p>
<p><br>
</p>
</div>
</div>
<br clear="both">
**************************************************************************<BR>
The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE<BR>
**************************************************************************<BR>
</body>
</html>