[slurm-users] Job ended with OUT_OF_MEMORY even though MaxRSS and MaxVMSize are under the ReqMem value

Chin,David dwc62 at drexel.edu
Mon Mar 15 17:52:41 UTC 2021

Hi, all:

I'm trying to understand why a job exited with an error condition. I think it was actually terminated by Slurm: job was a Matlab script, and its output was incomplete.

Here's sacct output:

               JobID    JobName      User  Partition        NodeList    Elapsed      State ExitCode     ReqMem     MaxRSS  MaxVMSize                        AllocTRES AllocGRE
-------------------- ---------- --------- ---------- --------------- ---------- ---------- -------- ---------- ---------- ---------- -------------------------------- --------
               83387 ProdEmisI+      foob        def         node001   03:34:26 OUT_OF_ME+    0:125      128Gn                               billing=16,cpu=16,node=1
         83387.batch      batch                              node001   03:34:26 OUT_OF_ME+    0:125      128Gn   1617705K   7880672K              cpu=16,mem=0,node=1
        83387.extern     extern                              node001   03:34:26  COMPLETED      0:0      128Gn       460K    153196K         billing=16,cpu=16,node=1

Thanks in advance,

David Chin, PhD (he/him)   Sr. SysAdmin, URCF, Drexel
dwc62 at drexel.edu                     215.571.4335 (o)
For URCF support: urcf-support at drexel.edu

Drexel Internal Data
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210315/87b4f2fe/attachment.htm>

More information about the slurm-users mailing list