[slurm-users] Reserve some cores per GPU

Aaron Jackson Aaron.Jackson at nottingham.ac.uk
Tue Oct 20 20:20:56 UTC 2020


I look after a very heterogeneous GPU Slurm setup and some nodes have
quite few cores. We use a job_submit lua script which calculates the
number of requested cpu cores per gpu. This is then used to scan through
a table of 'weak nodes' based on a 'max cores per gpu' property. The
node names are appended to the job desc exc_nodes property.

It's not particularly elegant but it does work quite well for us.

Aaron


On 20 October 2020 at 18:17 BST, Relu Patrascu wrote:

> Hi all,
>
> We have a GPU cluster and have run into this issue occasionally. Assume 
> four GPUs per node; when a user requests a GPU on such a node, and all 
> the cores, or all the RAM, the other three GPUs will be wasted for the 
> duration of the job, as slurm has no more cores or RAM available to 
> allocate those GPUs to subsequent jobs.
>
>
> We have a "soft" solution to this, but it's not ideal. That is, we 
> assigned large TresBillingWeights to cpu consumption, thus discouraging 
> users to allocate many CPUs.
>
>
> Ideal for us would be to be able to define a number of CPUs to always be 
> available on a node, for each GPU. Would help to a similar feature for 
> an amount of RAM.
>
>
> Take for example a node that has:
>
> * four GPUs
>
> * 16 CPUs
>
>
> Let's assume that most jobs would work just fine with a minimum number 
> of 2 CPUs per GPU. Then we could set in the node definition a variable 
> such as
>
>   CpusReservedPerGpu = 2
>
> The first job to run on this node could get between 2 and 10 CPUs, thus 
> 6 CPUs remaining for potential incoming jobs (2 per GPU).
>
>
> We couldn't find a way to do this, are we missing something? We'd rather 
> not modify the source code again :/
>
> Regards,
>
> Relu


-- 
Research Fellow
School of Computer Science
University of Nottingham



This message and any attachment are intended solely for the addressee
and may contain confidential information. If you have received this
message in error, please contact the sender and delete the email and
attachment. 

Any views or opinions expressed by the author of this email do not
necessarily reflect the views of the University of Nottingham. Email
communications with the University of Nottingham may be monitored 
where permitted by law.







More information about the slurm-users mailing list