[slurm-users] How to limit # of execution slots for a given node
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Fri Jan 7 07:15:45 UTC 2022
Hi David,
On 1/6/22 22:39, David Henkemeyer wrote:
> When my team used PBS, we had several nodes that had a TON of CPUs, so
> many, in fact, that we ended up setting np to a smaller value, in order to
> not starve the system of memory.
>
> What is the best way to do this with Slurm? I tried modifying # of CPUs
> in the slurm.conf file, but I noticed that Slurm enforces that "CPUs" is
> equal to Boards * SocketsPerBoard * CoresPerSocket * ThreadsPerCore. This
> left me with having to "fool" Slurm into thinking there were either fewer
> ThreadsPerCore, fewer CoresPerSocket, or fewer SocketsPerBoard. This is a
> less than ideal solution, it seems to me. At least, it left me feeling
> like there has to be a better way.
If your goal is to limit the amount of RAM memory per job, then kernel
Cgroups is probably the answer. I've collected some information in my
Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#cgroup-configuration
If some users need more RAM than available for 1 core, they have to submit
jobs for a larger number of cores to get it. This makes a lot of sense, IMHO.
SchedMD is working on the use of Cgroups v2, see the talk "Slurm 21.08 and
Beyond" by Tim Wickberg, SchedMD, https://slurm.schedmd.com/publications.html
You could probably "fool" Slurm as you describe it, but that shouldn't be
necessary.
I hope this helps.
/Ole
More information about the slurm-users
mailing list