<div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><br>are you sure, your 24 core nodes have 187 TERABYTES memory?<br>
<br>
As you yourself cited:<br>
<blockquote type="cite">Size of real memory on the node in megabytes</blockquote>
The settings in your slurm.conf:<br>
<blockquote type="cite">NodeName=node[001-003] CoresPerSocket=12
RealMemory=196489092 Sockets=2 Gres=gpu:1<br>
</blockquote>
so, your machines should have 196489092 megabytes memory, that are
~191884 gigabytes or ~187 terabytes<br></div></blockquote><div><br></div><div>192 GB .</div><div><br></div><div>What was also throwing me off was this error:</div><div><span style="font-family:monospace">error: _slurm_rpc_node_registration node=node003: Invalid argument</span><br></div><div><br></div><div>Invalid in this case appears to be "too high".</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>It sees only 191840 megabytes, which is still less than the 191884.
Since the available memory changes slightly from OS version to OS
version, I would suggest to set RealMemory to less than 191840, e.g.
191800.<br>
But Brian already told you to reduce the RealMemory:<br>
<blockquote type="cite">I would suggest RealMemory=191879 , where I
suspect you have RealMemory=196489092</blockquote></div></blockquote><div><br></div><div>Thanks Marcus and Brian that was indeed the culprit. </div></div></div>