<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>As I understand it, that setting means "Always have at least X
nodes up", which includes running jobs. So it stops any wait time
for the first X jobs being submitted, but any jobs after that will
need to wait for the power_up sequence.</p>
<p>Brian Andrus<br>
</p>
<div class="moz-cite-prefix">On 11/22/2023 6:58 AM, Davide DelVento
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAAX1q8YAx9BL2iUdFQXbtvwAxuXhKcQE5vy-d9=y71tmL1jLhQ@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div>I've started playing with powersave and have a question
about SuspendExcNodes. The documentation at <a
href="https://slurm.schedmd.com/power_save.html"
moz-do-not-send="true" class="moz-txt-link-freetext">https://slurm.schedmd.com/power_save.html</a>
says</div>
<div><br>
</div>
<div><span
style="color:rgb(70,84,92);font-family:"Source Sans Pro",Helvetica,Arial,sans-serif;font-size:20px">For
example </span><code
style="box-sizing:border-box;margin:0px 0px 1.5em;padding:0px 0.2em;border:1px solid rgb(232,232,232);font-variant-numeric:inherit;font-variant-east-asian:inherit;font-variant-alternates:inherit;font-stretch:inherit;font-size:20px;line-height:1.5em;font-family:"Source Code Pro",monospace;font-kerning:inherit;font-feature-settings:inherit;vertical-align:baseline;display:inline;overflow:auto;border-radius:5px;background-color:rgb(232,232,232);color:rgb(70,84,92)">nid[10-20]:4</code><span
style="color:rgb(70,84,92);font-family:"Source Sans Pro",Helvetica,Arial,sans-serif;font-size:20px"> will
prevent 4 usable nodes (i.e IDLE and not DOWN, DRAINING or
already powered down) in the set </span><code
style="box-sizing:border-box;margin:0px 0px 1.5em;padding:0px 0.2em;border:1px solid rgb(232,232,232);font-variant-numeric:inherit;font-variant-east-asian:inherit;font-variant-alternates:inherit;font-stretch:inherit;font-size:20px;line-height:1.5em;font-family:"Source Code Pro",monospace;font-kerning:inherit;font-feature-settings:inherit;vertical-align:baseline;display:inline;overflow:auto;border-radius:5px;background-color:rgb(232,232,232);color:rgb(70,84,92)">nid[10-20]</code><span
style="color:rgb(70,84,92);font-family:"Source Sans Pro",Helvetica,Arial,sans-serif;font-size:20px"> from
being powered down.</span><br>
</div>
<div><br>
</div>
<div>I initially interpreted that as "Slurm will try to keep 4
nodes idle on as much as possible", which would have reduced
the wait time for new jobs targeting those nodes. Instead, it
appears to mean "Slurm will not shut off the last 4 nodes
which are idle in that partition, however it will not turn on
nodes which it shut off earlier unless jobs are scheduled on
them"</div>
<div><br>
</div>
<div>Most notably if the 4 idle nodes will be allocated to other
jobs (and so they are no idle anymore) slurm does not turn on
any nodes which have been shut off earlier, so it's possible
(and depending on workloads perhaps even common) to have no
idle nodes on regardless of the SuspendExcNode settings.</div>
<div><br>
</div>
<div>Is that how it works, or do I have anything else in my
setting which is causing this unexpected-to-me behavior? I
think I can live with it, but IMHO it would have been better
if slurm attempted to turn on nodes preemptively trying to
match the requested SuspendExcNodes, rather than waiting for
job submissions.</div>
<div><br>
</div>
<div>Thanks and Happy Thanksgiving to people in the USA</div>
</div>
</blockquote>
</body>
</html>