<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Sorry for the late reply.</p>
<p>For my site, I used the optional ":" separator to ensure at least
4 nodes were up. Eg: nid[10-20]:4<br>
This means at least 4 nodes.. those nodes do not have to be the
same 4 at any time, so if one is down that used to be idle, but 4
are up, that 1 will not be brought back up. I don't see this
setting having much of anything to do with bringing nodes up at
all with the exception of when you first start slurmctld and the
settings are not met. Once there are jobs running on any of the
listed nodes, they count toward the number. That is my experience
with the small numbers I used. YMMV.<br>
</p>
<p>I have also explicitly stated nodes without the separator, which
does work. I do that when I am trying to look at a node that is
idle without a job on it. That stops slurm from shutting it down
while I am looking at it.</p>
<p>Although, I do agree, the functionality of being able to have
"keep at least X nodes up and idle" would be nice, that is not how
I see this documented or working.</p>
<p>Brian Andrus<br>
</p>
<div class="moz-cite-prefix">On 11/23/2023 5:12 AM, Davide DelVento
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAAX1q8Zb92ZMBQp3h+UqZV4-cqLeMPwFV4z01C=XB3A4VbnGEg@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">Thanks for confirming, Brian. That was my
understanding as well. Do you have it working that way on a
machine you have access to? If so, I'd be interested to see the
config file, because that's not the behavior I am experiencing
in my tests.
<div>
<div>
<div>In fact, in my tests Slurm will not bring down those "X
nodes" but will not bring them up either, *unless* there
is a job targeted to those. I may have something
misconfigured, and I'd love to fix that.</div>
</div>
</div>
<div><br>
</div>
<div>Thanks!</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Nov 22, 2023 at
5:46 PM Brian Andrus <<a href="mailto:toomuchit@gmail.com"
moz-do-not-send="true" class="moz-txt-link-freetext">toomuchit@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>As I understand it, that setting means "Always have at
least X nodes up", which includes running jobs. So it
stops any wait time for the first X jobs being submitted,
but any jobs after that will need to wait for the power_up
sequence.</p>
<p>Brian Andrus<br>
</p>
<div>On 11/22/2023 6:58 AM, Davide DelVento wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>I've started playing with powersave and have a
question about SuspendExcNodes. The documentation at <a
href="https://slurm.schedmd.com/power_save.html"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">https://slurm.schedmd.com/power_save.html</a>
says</div>
<div><br>
</div>
<div><span
style="color:rgb(70,84,92);font-family:"Source Sans Pro",Helvetica,Arial,sans-serif;font-size:20px">For
example </span><code
style="box-sizing:border-box;margin:0px 0px 1.5em;padding:0px 0.2em;border:1px solid rgb(232,232,232);font-variant-numeric:inherit;font-variant-east-asian:inherit;font-variant-alternates:inherit;font-stretch:inherit;font-size:20px;line-height:1.5em;font-family:"Source Code Pro",monospace;font-kerning:inherit;font-feature-settings:inherit;vertical-align:baseline;display:inline;overflow:auto;border-radius:5px;background-color:rgb(232,232,232);color:rgb(70,84,92)">nid[10-20]:4</code><span
style="color:rgb(70,84,92);font-family:"Source Sans Pro",Helvetica,Arial,sans-serif;font-size:20px"> will
prevent 4 usable nodes (i.e IDLE and not DOWN,
DRAINING or already powered down) in the set </span><code
style="box-sizing:border-box;margin:0px 0px 1.5em;padding:0px 0.2em;border:1px solid rgb(232,232,232);font-variant-numeric:inherit;font-variant-east-asian:inherit;font-variant-alternates:inherit;font-stretch:inherit;font-size:20px;line-height:1.5em;font-family:"Source Code Pro",monospace;font-kerning:inherit;font-feature-settings:inherit;vertical-align:baseline;display:inline;overflow:auto;border-radius:5px;background-color:rgb(232,232,232);color:rgb(70,84,92)">nid[10-20]</code><span
style="color:rgb(70,84,92);font-family:"Source Sans Pro",Helvetica,Arial,sans-serif;font-size:20px"> from
being powered down.</span><br>
</div>
<div><br>
</div>
<div>I initially interpreted that as "Slurm will try to
keep 4 nodes idle on as much as possible", which would
have reduced the wait time for new jobs targeting
those nodes. Instead, it appears to mean "Slurm will
not shut off the last 4 nodes which are idle in that
partition, however it will not turn on nodes which it
shut off earlier unless jobs are scheduled on them"</div>
<div><br>
</div>
<div>Most notably if the 4 idle nodes will be allocated
to other jobs (and so they are no idle anymore) slurm
does not turn on any nodes which have been shut off
earlier, so it's possible (and depending on workloads
perhaps even common) to have no idle nodes on
regardless of the SuspendExcNode settings.</div>
<div><br>
</div>
<div>Is that how it works, or do I have anything else in
my setting which is causing this unexpected-to-me
behavior? I think I can live with it, but IMHO it
would have been better if slurm attempted to turn on
nodes preemptively trying to match the requested
SuspendExcNodes, rather than waiting for job
submissions.</div>
<div><br>
</div>
<div>Thanks and Happy Thanksgiving to people in the USA</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</body>
</html>