[slurm-users] [EXT] Job ended with OUT_OF_MEMORY even though MaxRSS and MaxVMSize are under the ReqMem value
Sean Crosby
scrosby at unimelb.edu.au
Mon Mar 15 19:22:08 UTC 2021
What are your Slurm settings - what's the values of
ProctrackType
JobAcctGatherType
JobAcctGatherParams
and what's the contents of cgroup.conf? Also, what version of Slurm are you
using?
Sean
--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia
On Tue, 16 Mar 2021 at 04:52, Chin,David <dwc62 at drexel.edu> wrote:
> * UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts *
> ------------------------------
> Hi, all:
>
> I'm trying to understand why a job exited with an error condition. I think
> it was actually terminated by Slurm: job was a Matlab script, and its
> output was incomplete.
>
> Here's sacct output:
>
> JobID JobName User Partition NodeList
> Elapsed State ExitCode ReqMem MaxRSS MaxVMSize
> AllocTRES AllocGRE
> -------------------- ---------- --------- ---------- ---------------
> ---------- ---------- -------- ---------- ---------- ----------
> -------------------------------- --------
> 83387 ProdEmisI+ foob def node001
> 03:34:26 OUT_OF_ME+ 0:125 128Gn
> billing=16,cpu=16,node=1
> 83387.batch batch node001
> 03:34:26 OUT_OF_ME+ 0:125 128Gn 1617705K 7880672K
> cpu=16,mem=0,node=1
> 83387.extern extern node001
> 03:34:26 COMPLETED 0:0 128Gn 460K 153196K
> billing=16,cpu=16,node=1
>
> Thanks in advance,
> Dave
>
> --
> David Chin, PhD (he/him) Sr. SysAdmin, URCF, Drexel
> dwc62 at drexel.edu 215.571.4335 (o)
> For URCF support: urcf-support at drexel.edu
> https://proteusmaster.urcf.drexel.edu/urcfwiki
> github:prehensilecode
>
>
> Drexel Internal Data
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210316/b6866f07/attachment-0001.htm>
More information about the slurm-users
mailing list