[slurm-users] Slurm powersave
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Fri Oct 6 10:26:12 UTC 2023
Hi Davide,
On 10/5/23 15:28, Davide DelVento wrote:
> IMHO, "pretending" to power down nodes defies the logic of the Slurm
> power_save plugin.
>
> And it is sure useless ;)
> But I was using the suggestion from
> https://slurm.schedmd.com/power_save.html
> <https://slurm.schedmd.com/power_save.html> which says
>
> You can also configure Slurm with programs that perform no action as
> *SuspendProgram* and *ResumeProgram* to assess the potential impact of
> power saving mode before enabling it.
I had not noticed the above sentence in the power_save manual before! So
I decided to test a "no action" power saving script, similar to what you
have done, applying it to a test partition. I conclude that "no action"
power saving DOES NOT WORK, at least in Slurm 23.02.5. So I opened a bug
report https://bugs.schedmd.com/show_bug.cgi?id=17848 to find out if the
documentation is obsolete, or if there may be a bug. Please follow that
bug to find out the answer from SchedMD.
What I *believe* (but not with 100% certainty) really happens with power
saving in the current Slurm versions is what I wrote yesterday:
> Slurmctld expects suspended nodes to *really* power
> down (slurmd is stopped). When slurmctld resumes a suspended node, it
> expects slurmd to start up when the node is powered on. There is a
> ResumeTimeout parameter which I've set to about 15-30 minutes in case of
> delays due to BIOS updates and the like - the default of 60 seconds is
> WAY too small!
I hope this helps,
Ole
More information about the slurm-users
mailing list