[slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior

Bjørn-Helge Mevik b.h.mevik at usit.uio.no
Thu Dec 12 08:47:36 UTC 2019


Beatrice Charton <beatrice.charton at criann.fr> writes:

> Hi,
>
> We have a strange behaviour of Slurm after updating from 18.08.7 to
> 18.08.8, for jobs using --exclusive and --mem-per-cpu.
>
> Our nodes have 128GB of memory, 28 cores.
> 	$ srun  --mem-per-cpu=30000 -n 1  --exclusive  hostname
> => works in 18.08.7 
> => doesn’t work in 18.08.8

I'm actually surprised it _worked_ in 18.08.7.  At one time - long before
v 18.08, the behaviour was changed when using --exclusive: In order to
account the job for all cpus on the node, the number of
cpus asked for with --ntasks would simply be multiplied with with
"#cpus-on-node / --ntasks" (so in your case: 28).  Unfortunately, that
also means that the memory the job requires per node is "#cpus-on-node /
--ntasks" multiplied with --mem-per-cpu (in your case 28 * 30000 MiB ~=
820 GiB).  For this reason, we tend to ban --exclusive on our clusters
(or at least warn about it).

I haven't looked at the code for a long time, so I don't know whether
this is still the current behaviour, but every time I've tested, I've
seen the same problem.  I believe I've tested on 19.05 (but I might
remember wrong).

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191212/de0a78e2/attachment-0001.sig>


More information about the slurm-users mailing list