[slurm-users] slurm power save question

Brian Andrus toomuchit at gmail.com
Thu Nov 23 00:44:30 UTC 2023


As I understand it, that setting means "Always have at least X nodes 
up", which includes running jobs. So it stops any wait time for the 
first X jobs being submitted, but any jobs after that will need to wait 
for the power_up sequence.

Brian Andrus

On 11/22/2023 6:58 AM, Davide DelVento wrote:
> I've started playing with powersave and have a question about 
> SuspendExcNodes. The documentation at 
> https://slurm.schedmd.com/power_save.html says
>
> For example |nid[10-20]:4| will prevent 4 usable nodes (i.e IDLE and 
> not DOWN, DRAINING or already powered down) in the set 
> |nid[10-20]| from being powered down.
>
> I initially interpreted that as "Slurm will try to keep 4 nodes idle 
> on as much as possible", which would have reduced the wait time for 
> new jobs targeting those nodes. Instead, it appears to mean "Slurm 
> will not shut off the last 4 nodes which are idle in that partition, 
> however it will not turn on nodes which it shut off earlier unless 
> jobs are scheduled on them"
>
> Most notably if the 4 idle nodes will be allocated to other jobs (and 
> so they are no idle anymore) slurm does not turn on any nodes which 
> have been shut off earlier, so it's possible (and depending on 
> workloads perhaps even common) to have no idle nodes on regardless of 
> the SuspendExcNode settings.
>
> Is that how it works, or do I have anything else in my setting which 
> is causing this unexpected-to-me behavior? I think I can live with it, 
> but IMHO it would have been better if slurm attempted to turn on nodes 
> preemptively trying to match the requested SuspendExcNodes, rather 
> than waiting for job submissions.
>
> Thanks and Happy Thanksgiving to people in the USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231122/5a5ddfc6/attachment.htm>


More information about the slurm-users mailing list