[slurm-users] How to limit # of execution slots for a given node
Paul Edmon
pedmon at cfa.harvard.edu
Fri Jan 7 14:18:34 UTC 2022
You can actually spoof the number of cores and RAM on a node by using
the config_override option. I've used that before for testing
purposes. Mind you core binding and other features like that will not
work if you start spoofing the number of cores and ram, so use with caution.
-Paul Edmon-
On 1/7/2022 2:36 AM, Rémi Palancher wrote:
> Le jeudi 6 janvier 2022 à 22:39, David Henkemeyer <david.henkemeyer at gmail.com> a écrit :
>
>> All,
>>
>> When my team used PBS, we had several nodes that had a TON of CPUs, so many, in fact, that we ended up setting np to a smaller value, in order to not starve the system of memory.
>>
>> What is the best way to do this with Slurm? I tried modifying # of CPUs in the slurm.conf file, but I noticed that Slurm enforces that "CPUs" is equal to Boards * SocketsPerBoard * CoresPerSocket * ThreadsPerCore. This left me with having to "fool" Slurm into thinking there were either fewer ThreadsPerCore, fewer CoresPerSocket, or fewer SocketsPerBoard. This is a less than ideal solution, it seems to me. At least, it left me feeling like there has to be a better way.
> I'm not sure you can lie to Slurm about the real number of CPUs on the nodes.
>
> If you want to prevent Slurm from allocating more than n CPUs below the total number of CPUs of these nodes, I guess one solution is to use MaxCPUsPerNode=n at the partition level.
>
> You can also mask "system" CPUs with CpuSpecList at node level.
>
> The later is better if you need fine grain control over the exact list of reserved CPUs regarding NUMA topology or whatever.
>
> --
> Rémi Palancher
> Rackslab: Open Source Solutions for HPC Operations
> https://rackslab.io
>
>
>
More information about the slurm-users
mailing list