[slurm-users] Slurm powersave

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Fri Oct 6 10:26:12 UTC 2023


Hi Davide,

On 10/5/23 15:28, Davide DelVento wrote:
>     IMHO, "pretending" to power down nodes defies the logic of the Slurm
>     power_save plugin. 
> 
> And it is sure useless ;)
> But I was using the suggestion from 
> https://slurm.schedmd.com/power_save.html 
> <https://slurm.schedmd.com/power_save.html> which says
> 
> You can also configure Slurm with programs that perform no action as 
> *SuspendProgram* and *ResumeProgram* to assess the potential impact of 
> power saving mode before enabling it.

I had not noticed the above sentence in the power_save manual before!  So 
I decided to test a "no action" power saving script, similar to what you 
have done, applying it to a test partition.  I conclude that "no action" 
power saving DOES NOT WORK, at least in Slurm 23.02.5.  So I opened a bug 
report https://bugs.schedmd.com/show_bug.cgi?id=17848 to find out if the 
documentation is obsolete, or if there may be a bug.  Please follow that 
bug to find out the answer from SchedMD.

What I *believe* (but not with 100% certainty) really happens with power 
saving in the current Slurm versions is what I wrote yesterday:

>     Slurmctld expects suspended nodes to *really* power
>     down (slurmd is stopped).  When slurmctld resumes a suspended node, it
>     expects slurmd to start up when the node is powered on.  There is a
>     ResumeTimeout parameter which I've set to about 15-30 minutes in case of
>     delays due to BIOS updates and the like - the default of 60 seconds is
>     WAY too small!

I hope this helps,
Ole



More information about the slurm-users mailing list