[slurm-users] Can't run jobs after upgrade to 17.11.5 due to memory?

Eli V eliventer at gmail.com
Tue Jun 12 07:34:41 MDT 2018


Yes, I saw the same issue. Default for unset DefMemPerCPU changed from
unlimited in earlier versions to 0. I just set it to 384 in slurm.conf
so simple things run fine and make sure users always set a sane value
on submission.

On Mon, Jun 11, 2018 at 6:40 PM, Roberts, John E. <jeroberts at anl.gov> wrote:
> I see this in the debug logs:
> "memory per node set to 1M in partition bdwall"
>
> I seemingly can alleviate this if I set RealMemory=foo in the Node definitions, but this just seems like something that shouldn't be necessary.
> Did this become a required field after 16.05??
>
> Thanks!
> John
>
> On 6/11/18, 4:12 PM, "Roberts, John E." <jeroberts at anl.gov> wrote:
>
>     Nothing I assume isn't correct:
>
>     DefMemPerNode           = UNLIMITED
>     MaxMemPerNode           = UNLIMITED
>     MemLimitEnforce         = Yes
>     PropagateResourceLimitsExcept = MEMLOCK
>
>     CPU vars aren't set and never were.
>
>     Thanks!
>     John
>
>     On 6/11/18, 4:09 PM, "slurm-users on behalf of Renfro, Michael" <slurm-users-bounces at lists.schedmd.com on behalf of Renfro at tntech.edu> wrote:
>
>         Anything in particular set for DefMemPerCPU in your slurm.conf?
>
>         > On Jun 11, 2018, at 3:50 PM, Roberts, John E. <jeroberts at anl.gov> wrote:
>         >
>         > Hi,
>         >
>         >    Seeing this after an upgrade today. I now can't get any jobs to run. Things were fin before the upgrade. Any Ideas?
>         >
>         >    slurmstepd: error: Job 535721 exceeded memory limit (1160 > 1024), being killed
>         >    slurmstepd: error: Exceeded job memory limit
>
>
>
>
>
>



More information about the slurm-users mailing list