[slurm-users] Reserve some cores per GPU
Relu Patrascu
relu at cs.toronto.edu
Tue Oct 20 17:17:42 UTC 2020
Hi all,
We have a GPU cluster and have run into this issue occasionally. Assume
four GPUs per node; when a user requests a GPU on such a node, and all
the cores, or all the RAM, the other three GPUs will be wasted for the
duration of the job, as slurm has no more cores or RAM available to
allocate those GPUs to subsequent jobs.
We have a "soft" solution to this, but it's not ideal. That is, we
assigned large TresBillingWeights to cpu consumption, thus discouraging
users to allocate many CPUs.
Ideal for us would be to be able to define a number of CPUs to always be
available on a node, for each GPU. Would help to a similar feature for
an amount of RAM.
Take for example a node that has:
* four GPUs
* 16 CPUs
Let's assume that most jobs would work just fine with a minimum number
of 2 CPUs per GPU. Then we could set in the node definition a variable
such as
CpusReservedPerGpu = 2
The first job to run on this node could get between 2 and 10 CPUs, thus
6 CPUs remaining for potential incoming jobs (2 per GPU).
We couldn't find a way to do this, are we missing something? We'd rather
not modify the source code again :/
Regards,
Relu
More information about the slurm-users
mailing list