[slurm-users] Elastic Compute

Tue Sep 11 15:20:07 MDT 2018

Thanks for the input! I tried a few more things but wasn't able to get the
behavior I want.
 Here's what I tried so far:
- Set SelectTypeParameter to "CR_CPU,CR_LLN".
- Set SelectTypeParameter to "CR_CPU,CR_Pack_Nodes". The documentation for
this parameter seems to described the behavior I want (pack jobs as densely
as possible on instances, i.e., minimize the number of instances).
- Assign Weights to nodes as follows:
NodeName=compute-X Weight=X

The different configurations result all in the same behavior: If jobs are
coming in when the start of a node has been triggered, but the node is not
yet up and running, SLURM won't consider this resource but instead triggers
the creation of another node. As I'm expecting that this will happen pretty
regularly in the scenario I'm dealing with, that's kind of critical for me.
BTW: I'm using SLURM 18.08 and I restarted slurmctld after each change in
the configuration of course.

Am Di., 11. Sep. 2018 um 00:33 Uhr schrieb Brian Haymore <
brian.haymore at utah.edu>:

> I re-read the docs and I was wrong on the default behavior.  The default
> is "no" which just means don't oversubcribe the individual resources where
> I thought it was default to 'exclusive'.  So I think I've been taking us
> down a dead end in terms of what I thought might help. :\
>
>
> I have a system her that we are running with the elastic setup but there
> we are doing exclusive (and it's sent that way in the conf) scheduling so
> I've not run into the same circumstances you have.
>
> --
> Brian D. Haymore
> University of Utah
> Center for High Performance Computing
> 155 South 1452 East RM 405
> Salt Lake City, Ut 84112
> Phone: 801-558-1150, Fax: 801-585-5366
> http://bit.ly/1HO1N2C
>
> ________________________________________
> From: slurm-users [slurm-users-bounces at lists.schedmd.com] on behalf of
> Chris Samuel [chris at csamuel.org]
> Sent: Monday, September 10, 2018 4:17 PM
> To: slurm-users at lists.schedmd.com
> Subject: Re: [slurm-users] Elastic Compute
>
> On Tuesday, 11 September 2018 12:52:27 AM AEST Brian Haymore wrote:
>
> > I believe the default value of this would prevent jobs from sharing a
> node.
>
> But the jobs _do_ share a node when the resources become available, it's
> just
> that the cloud part of Slurm is bringing up the wrong number of nodes
> compared
> to what it will actually use.
>
> --
>  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180911/63540ff5/attachment.html>