[slurm-users] Can't run jobs after upgrade to 17.11.5 due to memory?

Roberts, John E. jeroberts at anl.gov
Mon Jun 11 14:50:43 MDT 2018


Hi,
    
    Seeing this after an upgrade today. I now can't get any jobs to run. Things were fin before the upgrade. Any Ideas?
    
    slurmstepd: error: Job 535721 exceeded memory limit (1160 > 1024), being killed
    slurmstepd: error: Exceeded job memory limit
    
    ulimit shows:
    $ ulimit -a | grep -i mem
    max locked memory       (kbytes, -l) unlimited
    max memory size         (kbytes, -m) unlimited
    virtual memory          (kbytes, -v) unlimited
    
    but ulimit from slurm shows:
    $ srun bash -c "ulimit -a" | grep -i mem
    max locked memory       (kbytes, -l) unlimited
    max memory size         (kbytes, -m) 1024
    virtual memory          (kbytes, -v) unlimited
    
    This is CentOS 7 and this is set:
    $ grep -i mem /etc/systemd/system/multi-user.target.wants/slurmd.service    
    LimitMEMLOCK=infinity
    
    Thanks!
    --
    John Roberts
    HPC Systems Administrator



More information about the slurm-users mailing list