<div dir="ltr">More investigation and your message has confirmed for me that it's all working in powers of 1024 (which is what I would expect, although the use of the word 'megabytes' in the doc is a little misleading, I think...).<br><br>So, our nodes have 187GiB total memory, and we need to re-jig our user documentation to reflect that 187G is the requestable cap for running on the '192GB' nodes.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 6 May 2020 at 11:12, Peter Kjellström <<a href="mailto:cap@nsc.liu.se" target="_blank">cap@nsc.liu.se</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, 6 May 2020 10:42:46 +0100<br>
Killian Murphy <<a href="mailto:killian.murphy@york.ac.uk" target="_blank">killian.murphy@york.ac.uk</a>> wrote:<br>
<br>
> Hi all.<br>
> <br>
> I'm probably making a rookie error here...which 'megabyte' (powers of<br>
> 1000 or 1024) does the Slurm documentation refer to in, for example,<br>
> the slurm.conf documentation for RealMemory and the sbatch<br>
> documentation for `--mem`?<br>
> <br>
> Most of our nodes have the same physical memory configuration. From<br>
> the output of `free -m` and `slurmd -C` on one of the nodes, we have<br>
> 191668M (187G). Consequently, `RealMemory` for those nodes has been<br>
> set to 191668 in slurm.conf. As a result, when a user requests memory<br>
> above '187G' for node memory, Slurm reports to them that the<br>
> requested node configuration is not available.<br>
<br>
Well yeah it's all 2-base. But it seems you have two problems 1) the<br>
units 2) users expecting 192GiB out of your nodes but the actual<br>
available memory is always lower (187G in your case).<br>
<br>
We see #2 also in that users know our thin nodes are 96G and some then<br>
proceed to request 96G (which does not fit on the 96G nodes...).<br>
<br>
>From our system:<br>
$ LOCALINTERACTIVECOMMAND interactive --mem=3g -n 1<br>
...<br>
$<br>
cat /sys/fs/cgroup/memory/slurm/uid_x/job_y/memory.limit_in_bytes<br>
3221225472 # 3.0 GiB<br>
<br>
(this on slurm-18.08.8 with mem cgroups)<br>
<br>
/Peter<br>
<br>
> Only...we have 191668MiB of system memory, not MB. `free -m --si` (use<br>
> powers of 1000, not 1024) reports 192628MB of system memory (which,<br>
> frustratingly indicates that the 'free' documentation is also not<br>
> using the newer unit names). So it seems as though Slurm is working<br>
> in powers of 1024, not powers of 1000.<br>
> <br>
> I'm probably just confused about the unit definitions, or there is<br>
> some convention being applied here, but would appreciate some<br>
> confirmation either way!<br>
> <br>
> Thanks.<br>
> <br>
> Killian<br>
<br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr" style="font-size:12.8px">Killian Murphy</div><div dir="ltr" style="font-size:12.8px">Research Software Engineer</div><div dir="ltr" style="font-size:12.8px"><br></div><div dir="ltr" style="font-size:12.8px">Wolfson Atmospheric Chemistry Laboratories<br>University of York<br>Heslington<br>York<br>YO10 5DD<br>+44 (0)1904 32 4753<br><br>e-mail disclaimer: <a href="http://www.york.ac.uk/docs/disclaimer/email.htm" style="color:rgb(17,85,204)" target="_blank">http://www.york.ac.uk/docs/disclaimer/email.htm</a></div></div></div></div></div></div></div></div></div></div></div></div></div>