[slurm-users] Node node00x has low real_memory size & slurm_rpc_node_registration node=node003: Invalid argument

Robert Kudyba rkudyba at fordham.edu
Tue Jan 21 15:59:38 UTC 2020


>
>
> are you sure, your 24 core nodes have 187 TERABYTES memory?
>
> As you yourself cited:
>
> Size of real memory on the node in megabytes
>
> The settings in your slurm.conf:
>
> NodeName=node[001-003]  CoresPerSocket=12 RealMemory=196489092 Sockets=2
> Gres=gpu:1
>
> so, your machines should have 196489092 megabytes memory, that are ~191884
> gigabytes or ~187 terabytes
>

192 GB .

What was also throwing me off was this error:
error: _slurm_rpc_node_registration node=node003: Invalid argument

Invalid in this case appears to be "too high".

It sees only 191840 megabytes, which is still less than the 191884. Since
> the available memory changes slightly from OS version to OS version, I
> would suggest to set RealMemory to less than 191840, e.g. 191800.
> But Brian already told you to reduce the RealMemory:
>
> I would suggest RealMemory=191879 , where I suspect you have
> RealMemory=196489092
>
>
Thanks Marcus and Brian that was indeed the culprit.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200121/b0ece48f/attachment.htm>


More information about the slurm-users mailing list