<div dir="ltr"><div>In case it's useful to others: I've been able to get this working by having the "no action" script stop the slurmd daemon and start it *with the -b option*.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Oct 6, 2023 at 4:28 AM Ole Holm Nielsen <<a href="mailto:Ole.H.Nielsen@fysik.dtu.dk">Ole.H.Nielsen@fysik.dtu.dk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Davide,<br>
<br>
On 10/5/23 15:28, Davide DelVento wrote:<br>
> IMHO, "pretending" to power down nodes defies the logic of the Slurm<br>
> power_save plugin. <br>
> <br>
> And it is sure useless ;)<br>
> But I was using the suggestion from <br>
> <a href="https://slurm.schedmd.com/power_save.html" rel="noreferrer" target="_blank">https://slurm.schedmd.com/power_save.html</a> <br>
> <<a href="https://slurm.schedmd.com/power_save.html" rel="noreferrer" target="_blank">https://slurm.schedmd.com/power_save.html</a>> which says<br>
> <br>
> You can also configure Slurm with programs that perform no action as <br>
> *SuspendProgram* and *ResumeProgram* to assess the potential impact of <br>
> power saving mode before enabling it.<br>
<br>
I had not noticed the above sentence in the power_save manual before! So <br>
I decided to test a "no action" power saving script, similar to what you <br>
have done, applying it to a test partition. I conclude that "no action" <br>
power saving DOES NOT WORK, at least in Slurm 23.02.5. So I opened a bug <br>
report <a href="https://bugs.schedmd.com/show_bug.cgi?id=17848" rel="noreferrer" target="_blank">https://bugs.schedmd.com/show_bug.cgi?id=17848</a> to find out if the <br>
documentation is obsolete, or if there may be a bug. Please follow that <br>
bug to find out the answer from SchedMD.<br>
<br>
What I *believe* (but not with 100% certainty) really happens with power <br>
saving in the current Slurm versions is what I wrote yesterday:<br>
<br>
> Slurmctld expects suspended nodes to *really* power<br>
> down (slurmd is stopped). When slurmctld resumes a suspended node, it<br>
> expects slurmd to start up when the node is powered on. There is a<br>
> ResumeTimeout parameter which I've set to about 15-30 minutes in case of<br>
> delays due to BIOS updates and the like - the default of 60 seconds is<br>
> WAY too small!<br>
<br>
I hope this helps,<br>
Ole<br>
<br>
</blockquote></div>