[slurm-users] "Low RealMem" after upgrade

Brian Andrus toomuchit at gmail.com
Fri Oct 1 17:46:53 UTC 2021


Not unusual. You should set your amount of memory a bit below what 
slurmd reports.

Different kernel modules that get upgraded may use a little more memory, 
causing just this situation. There are other causes as well, but by 
providing the kernel/system some wiggle room, you prevent any issues.

Also helps with OOM killer situations.

Brian Andrus

On 10/1/2021 1:22 AM, Diego Zuccato wrote:
> Hello all.
>
> I just upgraded to Debian 11 that brings Slurm 21.08 and the newer 
> nodes upgraded w/o too many issues (just minor config changes, one 
> being RealMemory value in slurm.conf, since for some reason it seems 
> the new slurmd detects about 12MB less memory than before).
>
> But the older nodes are still marked IDLE+DRAIN:
> -8<--
> NodeName=str957-bl0-01 Arch=x86_64 CoresPerSocket=6
>    CPUAlloc=0 CPUTot=24 CPULoad=0.39
>    AvailableFeatures=ib,blade,intel,avx
>    ActiveFeatures=ib,blade,intel,avx
>    Gres=(null)
>    NodeAddr=str957-bl0-01 NodeHostName=str957-bl0-01 Version=20.11.4
>    OS=Linux 5.10.0-8-amd64 #1 SMP Debian 5.10.46-5 (2021-09-23)
>    RealMemory=64000 AllocMem=0 FreeMem=63518 Sockets=2 Boards=1
>    MemSpecLimit=2048
>    State=IDLE+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=2 Owner=N/A 
> MCS_label=N/A
>    Partitions=b1
>    BootTime=2021-10-01T09:35:42 SlurmdStartTime=2021-10-01T09:36:15
>    CfgTRES=cpu=24,mem=62.50G,billing=182
>    AllocTRES=
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>    Reason=Low RealMemory [root at 2021-10-01T08:08:18]
>    Comment=(null)
> -8<--
> I already reduced RealMemory line in slurm.conf and restarted both 
> slurmctld and slurmd (in case "scontrol reconfigure" was not enough... 
> not really clear from the docs).
>
> The relevant lines in slurm.conf are:
> -8<--
> NodeName=DEFAULT            Sockets=2 ThreadsPerCore=2  State=UNKNOWN  
> MemSpecLimit=2048
> NodeName=str957-bl0-0[1-2]            CoresPerSocket=6 
>  RealMemory=64000  Weight=2 Feature=ib,blade,intel,avx
> -8<--
>
> And the node says:
> -8<--
> root at str957-bl0-01:~# slurmd -C
> NodeName=str957-bl0-01 CPUs=24 Boards=1 SocketsPerBoard=2 
> CoresPerSocket=6 ThreadsPerCore=2 RealMemory=64378
> UpTime=0-00:37:17
> -8<--
>
> I also tried lowering RealMemory setting to 60000, in case 
> MemSpecLimit interfered, but the result remains the same.
>
> Any ideas?
>
> TIA!
>



More information about the slurm-users mailing list