[slurm-users] How to limit # of execution slots for a given node

Fri Jan 7 07:15:45 UTC 2022

Hi David,

On 1/6/22 22:39, David Henkemeyer wrote:
> When my team used PBS, we had several nodes that had a TON of CPUs, so 
> many, in fact, that we ended up setting np to a smaller value, in order to 
> not starve the system of memory.
> 
> What is the best way to do this with Slurm?  I tried modifying # of CPUs 
> in the slurm.conf file, but I noticed that Slurm enforces that "CPUs" is 
> equal to Boards * SocketsPerBoard * CoresPerSocket * ThreadsPerCore.  This 
> left me with having to "fool" Slurm into thinking there were either fewer 
> ThreadsPerCore, fewer CoresPerSocket, or fewer SocketsPerBoard.  This is a 
> less than ideal solution, it seems to me.  At least, it left me feeling 
> like there has to be a better way.

If your goal is to limit the amount of RAM memory per job, then kernel 
Cgroups is probably the answer.  I've collected some information in my 
Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#cgroup-configuration

If some users need more RAM than available for 1 core, they have to submit 
jobs for a larger number of cores to get it.  This makes a lot of sense, IMHO.

SchedMD is working on the use of Cgroups v2, see the talk "Slurm 21.08 and 
Beyond" by Tim Wickberg, SchedMD, https://slurm.schedmd.com/publications.html

You could probably "fool" Slurm as you describe it, but that shouldn't be 
necessary.

I hope this helps.

/Ole