[slurm-users] Can't run jobs after upgrade to 17.11.5 due to memory?

Roberts, John E. jeroberts at anl.gov
Mon Jun 11 16:40:53 MDT 2018


I see this in the debug logs:
"memory per node set to 1M in partition bdwall"

I seemingly can alleviate this if I set RealMemory=foo in the Node definitions, but this just seems like something that shouldn't be necessary.
Did this become a required field after 16.05??

Thanks!
John 

On 6/11/18, 4:12 PM, "Roberts, John E." <jeroberts at anl.gov> wrote:

    Nothing I assume isn't correct:
    
    DefMemPerNode           = UNLIMITED
    MaxMemPerNode           = UNLIMITED
    MemLimitEnforce         = Yes
    PropagateResourceLimitsExcept = MEMLOCK
    
    CPU vars aren't set and never were.
    
    Thanks!
    John 
    
    On 6/11/18, 4:09 PM, "slurm-users on behalf of Renfro, Michael" <slurm-users-bounces at lists.schedmd.com on behalf of Renfro at tntech.edu> wrote:
    
        Anything in particular set for DefMemPerCPU in your slurm.conf?
        
        > On Jun 11, 2018, at 3:50 PM, Roberts, John E. <jeroberts at anl.gov> wrote:
        > 
        > Hi,
        > 
        >    Seeing this after an upgrade today. I now can't get any jobs to run. Things were fin before the upgrade. Any Ideas?
        > 
        >    slurmstepd: error: Job 535721 exceeded memory limit (1160 > 1024), being killed
        >    slurmstepd: error: Exceeded job memory limit
        
        
        
    
    



More information about the slurm-users mailing list