<div dir="ltr">Felix,<div><br></div><div>Right now this would require Slurm code changes. </div><div><br></div><div>Jacob </div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Sep 13, 2018 at 12:10 AM, Felix Wolfheimer <span dir="ltr"><<a href="mailto:f.wolfheimer@googlemail.com" target="_blank">f.wolfheimer@googlemail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="auto">Thanks for the confirmation, Jacob. Is it possible to change this behavior? If there's no config parameter for this, I'm fine with changing the SLURM code to achieve this. It sounds like it'd be a very local change. <div dir="auto">As for cloud setups it's a pretty common goal to minimize the number of nodes, I'd also like to submit it as a feature request, then.</div></div><br><div class="gmail_quote"><div dir="ltr">Jacob Jenson <<a href="mailto:jacob@schedmd.com" target="_blank">jacob@schedmd.com</a>> schrieb am Mi., 12. Sep. 2018, 19:47:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">
<span style="font-size:12.8px;text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Currently, Slurm marks allocated nodes needing to be booted as unavailable for other jobs until they are booted. Once the node is booted, then normal packing should happen.</span><div class="m_-2849414273875606708m_29679020584349881gmail-yj6qo" style="font-size:12.8px;text-decoration-style:initial;text-decoration-color:initial"></div><br class="m_-2849414273875606708m_29679020584349881gmail-Apple-interchange-newline">Jacob </div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Sep 12, 2018 at 7:30 AM, Eli V <span dir="ltr"><<a href="mailto:eliventer@gmail.com" rel="noreferrer" target="_blank">eliventer@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Sound like you figured it out, but I mis-remembered and switched the<br>
case on CR_LLN. Setting it spreads the jobs out across the nodes, not<br>
filling one up first. Also, I believe it can be set per partition as<br>
well.<br>
<div class="m_-2849414273875606708m_29679020584349881HOEnZb"><div class="m_-2849414273875606708m_29679020584349881h5">On Tue, Sep 11, 2018 at 5:24 PM Felix Wolfheimer<br>
<<a href="mailto:f.wolfheimer@googlemail.com" rel="noreferrer" target="_blank">f.wolfheimer@googlemail.com</a>> wrote:<br>
><br>
> Thanks for the input! I tried a few more things but wasn't able to get the behavior I want.<br>
> Here's what I tried so far:<br>
> - Set SelectTypeParameter to "CR_CPU,CR_LLN".<br>
> - Set SelectTypeParameter to "CR_CPU,CR_Pack_Nodes". The documentation for this parameter seems to described the behavior I want (pack jobs as densely as possible on instances, i.e., minimize the number of instances).<br>
> - Assign Weights to nodes as follows:<br>
> NodeName=compute-X Weight=X<br>
><br>
> The different configurations result all in the same behavior: If jobs are coming in when the start of a node has been triggered, but the node is not yet up and running, SLURM won't consider this resource but instead triggers the creation of another node. As I'm expecting that this will happen pretty regularly in the scenario I'm dealing with, that's kind of critical for me. BTW: I'm using SLURM 18.08 and I restarted slurmctld after each change in the configuration of course.<br>
><br>
> Am Di., 11. Sep. 2018 um 00:33 Uhr schrieb Brian Haymore <<a href="mailto:brian.haymore@utah.edu" rel="noreferrer" target="_blank">brian.haymore@utah.edu</a>>:<br>
>><br>
>> I re-read the docs and I was wrong on the default behavior. The default is "no" which just means don't oversubcribe the individual resources where I thought it was default to 'exclusive'. So I think I've been taking us down a dead end in terms of what I thought might help. :\<br>
>><br>
>><br>
>> I have a system her that we are running with the elastic setup but there we are doing exclusive (and it's sent that way in the conf) scheduling so I've not run into the same circumstances you have.<br>
>><br>
>> --<br>
>> Brian D. Haymore<br>
>> University of Utah<br>
>> Center for High Performance Computing<br>
>> 155 South 1452 East RM 405<br>
>> Salt Lake City, Ut 84112<br>
>> Phone: 801-558-1150, Fax: 801-585-5366<br>
>> <a href="http://bit.ly/1HO1N2C" rel="noreferrer noreferrer" target="_blank">http://bit.ly/1HO1N2C</a><br>
>><br>
>> ______________________________<wbr>__________<br>
>> From: slurm-users [<a href="mailto:slurm-users-bounces@lists.schedmd.com" rel="noreferrer" target="_blank">slurm-users-bounces@lists.<wbr>schedmd.com</a>] on behalf of Chris Samuel [<a href="mailto:chris@csamuel.org" rel="noreferrer" target="_blank">chris@csamuel.org</a>]<br>
>> Sent: Monday, September 10, 2018 4:17 PM<br>
>> To: <a href="mailto:slurm-users@lists.schedmd.com" rel="noreferrer" target="_blank">slurm-users@lists.schedmd.com</a><br>
>> Subject: Re: [slurm-users] Elastic Compute<br>
>><br>
>> On Tuesday, 11 September 2018 12:52:27 AM AEST Brian Haymore wrote:<br>
>><br>
>> > I believe the default value of this would prevent jobs from sharing a node.<br>
>><br>
>> But the jobs _do_ share a node when the resources become available, it's just<br>
>> that the cloud part of Slurm is bringing up the wrong number of nodes compared<br>
>> to what it will actually use.<br>
>><br>
>> --<br>
>> Chris Samuel : <a href="http://www.csamuel.org/" rel="noreferrer noreferrer" target="_blank">http://www.csamuel.org/</a> : Melbourne, VIC<br>
>><br>
>><br>
>><br>
>><br>
>><br>
<br>
</div></div></blockquote></div><br></div>
</blockquote></div>
</blockquote></div><br></div>