[slurm-users] Slurm memory units

Killian Murphy killian.murphy at york.ac.uk
Wed May 6 11:18:27 UTC 2020


More investigation and your message has confirmed for me that it's all
working in powers of 1024 (which is what I would expect, although the use
of the word 'megabytes' in the doc is a little misleading, I think...).

So, our nodes have 187GiB total memory, and we need to re-jig our user
documentation to reflect that 187G is the requestable cap for running on
the '192GB' nodes.

On Wed, 6 May 2020 at 11:12, Peter Kjellström <cap at nsc.liu.se> wrote:

> On Wed, 6 May 2020 10:42:46 +0100
> Killian Murphy <killian.murphy at york.ac.uk> wrote:
>
> > Hi all.
> >
> > I'm probably making a rookie error here...which 'megabyte' (powers of
> > 1000 or 1024) does the Slurm documentation refer to in, for example,
> > the slurm.conf documentation for RealMemory and the sbatch
> > documentation for `--mem`?
> >
> > Most of our nodes have the same physical memory configuration. From
> > the output of `free -m` and `slurmd -C` on one of the nodes, we have
> > 191668M (187G). Consequently, `RealMemory` for those nodes has been
> > set to 191668 in slurm.conf. As a result, when a user requests memory
> > above '187G' for node memory, Slurm reports to them that the
> > requested node configuration is not available.
>
> Well yeah it's all 2-base. But it seems you have two problems 1) the
> units 2) users expecting 192GiB out of your nodes but the actual
> available memory is always lower (187G in your case).
>
> We see #2 also in that users know our thin nodes are 96G and some then
> proceed to request 96G (which does not fit on the 96G nodes...).
>
> From our system:
>  $ LOCALINTERACTIVECOMMAND interactive --mem=3g -n 1
>  ...
>  $
>  cat /sys/fs/cgroup/memory/slurm/uid_x/job_y/memory.limit_in_bytes
>  3221225472 # 3.0 GiB
>
> (this on slurm-18.08.8 with mem cgroups)
>
> /Peter
>
> > Only...we have 191668MiB of system memory, not MB. `free -m --si` (use
> > powers of 1000, not 1024) reports 192628MB of system memory (which,
> > frustratingly indicates that the 'free' documentation is also not
> > using the newer unit names). So it seems as though Slurm is working
> > in powers of 1024, not powers of 1000.
> >
> > I'm probably just confused about the unit definitions, or there is
> > some convention being applied here, but would appreciate some
> > confirmation either way!
> >
> > Thanks.
> >
> > Killian
>
>

-- 
Killian Murphy
Research Software Engineer

Wolfson Atmospheric Chemistry Laboratories
University of York
Heslington
York
YO10 5DD
+44 (0)1904 32 4753

e-mail disclaimer: http://www.york.ac.uk/docs/disclaimer/email.htm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200506/1f8418b3/attachment-0001.htm>


More information about the slurm-users mailing list