[slurm-users] Reserve some cores per GPU

Tue Oct 20 17:17:42 UTC 2020

Hi all,

We have a GPU cluster and have run into this issue occasionally. Assume 
four GPUs per node; when a user requests a GPU on such a node, and all 
the cores, or all the RAM, the other three GPUs will be wasted for the 
duration of the job, as slurm has no more cores or RAM available to 
allocate those GPUs to subsequent jobs.

We have a "soft" solution to this, but it's not ideal. That is, we 
assigned large TresBillingWeights to cpu consumption, thus discouraging 
users to allocate many CPUs.

Ideal for us would be to be able to define a number of CPUs to always be 
available on a node, for each GPU. Would help to a similar feature for 
an amount of RAM.

Take for example a node that has:

* four GPUs

* 16 CPUs

Let's assume that most jobs would work just fine with a minimum number 
of 2 CPUs per GPU. Then we could set in the node definition a variable 
such as

   CpusReservedPerGpu = 2

The first job to run on this node could get between 2 and 10 CPUs, thus 
6 CPUs remaining for potential incoming jobs (2 per GPU).

We couldn't find a way to do this, are we missing something? We'd rather 
not modify the source code again :/

Regards,

Relu