[slurm-users] Reserve some cores per GPU

Stephan Schott schottve at hhu.de
Wed Oct 21 12:46:42 UTC 2020


This is related to this other thread:
https://groups.google.com/g/slurm-users/c/88pZ400whu0/m/9FYFqKh6AQAJ
AFAIK, the only rudimentary solution is the MaxCPUsPerNode partition flag,
and setting independent gpu and cpu partitions, but having something like
"CpusReservedPerGpu" would be nice.

@Aaron would you be willing to share such a script?

El mié., 21 oct. 2020 a las 0:01, Relu Patrascu (<relu at cs.toronto.edu>)
escribió:

>
> I thought of doing this, but, I'm guessing you don't have preemption
> enabled.
>
> With preemption enabled this becomes more complicated, and error prone, but
>
> I'll think some more about it. It'd be nice leverage slurm's scheduling
> engine and
>
> just add this constraint.
>
> Relu
>
> On 2020-10-20 16:20, Aaron Jackson wrote:
> > I look after a very heterogeneous GPU Slurm setup and some nodes have
> > quite few cores. We use a job_submit lua script which calculates the
> > number of requested cpu cores per gpu. This is then used to scan through
> > a table of 'weak nodes' based on a 'max cores per gpu' property. The
> > node names are appended to the job desc exc_nodes property.
> >
> > It's not particularly elegant but it does work quite well for us.
> >
> > Aaron
> >
> >
> > On 20 October 2020 at 18:17 BST, Relu Patrascu wrote:
> >
> >> Hi all,
> >>
> >> We have a GPU cluster and have run into this issue occasionally. Assume
> >> four GPUs per node; when a user requests a GPU on such a node, and all
> >> the cores, or all the RAM, the other three GPUs will be wasted for the
> >> duration of the job, as slurm has no more cores or RAM available to
> >> allocate those GPUs to subsequent jobs.
> >>
> >>
> >> We have a "soft" solution to this, but it's not ideal. That is, we
> >> assigned large TresBillingWeights to cpu consumption, thus discouraging
> >> users to allocate many CPUs.
> >>
> >>
> >> Ideal for us would be to be able to define a number of CPUs to always be
> >> available on a node, for each GPU. Would help to a similar feature for
> >> an amount of RAM.
> >>
> >>
> >> Take for example a node that has:
> >>
> >> * four GPUs
> >>
> >> * 16 CPUs
> >>
> >>
> >> Let's assume that most jobs would work just fine with a minimum number
> >> of 2 CPUs per GPU. Then we could set in the node definition a variable
> >> such as
> >>
> >>    CpusReservedPerGpu = 2
> >>
> >> The first job to run on this node could get between 2 and 10 CPUs, thus
> >> 6 CPUs remaining for potential incoming jobs (2 per GPU).
> >>
> >>
> >> We couldn't find a way to do this, are we missing something? We'd rather
> >> not modify the source code again :/
> >>
> >> Regards,
> >>
> >> Relu
> >
>
>

-- 
Stephan Schott Verdugo
Biochemist

Heinrich-Heine-Universitaet Duesseldorf
Institut fuer Pharm. und Med. Chemie
Universitaetsstr. 1
40225 Duesseldorf
Germany
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201021/8c490257/attachment.htm>


More information about the slurm-users mailing list