<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: "Courier New", monospace; font-size: 12pt; color: rgb(0, 0, 0);">
One possible datapoint: on the node where the job ran, there were two slurmstepd processes running, both at 100%CPU even after the job had ended.</div>
<div style="font-family: "Courier New", monospace; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div>
<div style="font-family: "Courier New", monospace; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div id="Signature">
<div>
<div></div>
<div></div>
<div></div>
<div id="divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:'Courier New',monospace">
<div class="BodyFragment"><font size="2"><span style="font-size:10pt">
<div class="PlainText"></div>
<div class="PlainText" style="font-family:"Courier New",monospace; font-size:13.3333px">
</div>
<span id="ms-rterangepaste-start"></span>
<div>--</div>
<div>
<div>David Chin, PhD (he/him)   Sr. SysAdmin, URCF, Drexel</div>
<div>dwc62@drexel.edu                     215.571.4335 (o)</div>
<div>For URCF support: urcf-support@drexel.edu</div>
<div>https://proteusmaster.urcf.drexel.edu/urcfwiki</div>
<div>github:prehensilecode</div>
</div>
<span id="ms-rterangepaste-end"></span>
<div class="PlainText"><br>
</div>
</span></font></div>
</div>
</div>
</div>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Chin,David <dwc62@drexel.edu><br>
<b>Sent:</b> Monday, March 15, 2021 13:52<br>
<b>To:</b> Slurm-Users List <slurm-users@lists.schedmd.com><br>
<b>Subject:</b> [slurm-users] Job ended with OUT_OF_MEMORY even though MaxRSS and MaxVMSize are under the ReqMem value</font>
<div> </div>
</div>
<style type="text/css" style="display:none">
<!--
p
        {margin-top:0;
        margin-bottom:0}
-->
</style>
<div dir="ltr">
<table width="100%">
<tbody>
<tr>
<td style="border-left:4px solid goldenrod; background:cornsilk; padding:0 3pt">
<p style="font:small-caps bold 100% sans-serif">External.</p>
</td>
</tr>
</tbody>
</table>
<div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
Hi, all:</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
I'm trying to understand why a job exited with an error condition. I think it was actually terminated by Slurm: job was a Matlab script, and its output was incomplete. </div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
Here's sacct output:</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
               JobID    JobName      User  Partition        NodeList    Elapsed      State ExitCode     ReqMem     MaxRSS  MaxVMSize                        AllocTRES AllocGRE
<div>-------------------- ---------- --------- ---------- --------------- ---------- ---------- -------- ---------- ---------- ---------- -------------------------------- --------</div>
<div>               83387 ProdEmisI+      foob        def         node001   03:34:26 OUT_OF_ME+    0:125      128Gn                               billing=16,cpu=16,node=1</div>
<div>         83387.batch      batch                              node001   03:34:26 OUT_OF_ME+    0:125      128Gn   1617705K   7880672K              cpu=16,mem=0,node=1</div>
        83387.extern     extern                              node001   03:34:26  COMPLETED      0:0      128Gn       460K    153196K         billing=16,cpu=16,node=1<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
Thanks in advance,</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
    Dave</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div id="x_Signature">
<div>
<div></div>
<div></div>
<div></div>
<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:'Courier New',monospace">
<div class="x_BodyFragment"><font size="2"><span style="font-size:10pt">
<div class="x_PlainText"></div>
<div class="x_PlainText" style="font-family:"Courier New",monospace; font-size:13.3333px">
</div>
<span id="x_ms-rterangepaste-start"></span>
<div>--</div>
<div>
<div>David Chin, PhD (he/him)   Sr. SysAdmin, URCF, Drexel</div>
<div>dwc62@drexel.edu                     215.571.4335 (o)</div>
<div>For URCF support: urcf-support@drexel.edu</div>
<div>https://proteusmaster.urcf.drexel.edu/urcfwiki</div>
<div>github:prehensilecode</div>
</div>
<span id="x_ms-rterangepaste-end"></span>
<div class="x_PlainText"><br>
</div>
</span></font></div>
</div>
</div>
</div>
</div>
<br>
<p align="Left" style="font-family:Calibri; font-size:10pt; color:#000000; margin:5pt">
Drexel Internal Data<br>
</p>
</div>
<br>
<p align="Left" style="font-family:Calibri; font-size:10pt; color:#000000; margin:5pt">
Drexel Internal Data<br>
</p>
</div>
<br>
<p style="font-family:Calibri;font-size:10pt;color:#000000;margin:5pt;" align="Left">
Drexel Internal Data<br>
</p>
</body>
</html>