Hello,
We are enabling global power saving for slurm. This is our config. But we are experiencing an issue where Slurm is talking nodes out of power saving mode.
We are using config less and dynamic nodes.
Slurmctrld logs
2025-01-08T09:54:18.096] Cleared POWER_SAVE flag from nodes dev3-cbf-debug-e2std4-s0-[0-1],dev3-cbf-infra-e2std4-s0-[0-1],dev3-cbf-slurmd-e2std4-[0-4] [2025-01-08T09:54:18.097] debug: power_save module disabled, SuspendTime < 0 [2025-01-08T10:05:09.683] debug: power_save module disabled, SuspendTime < 0 [2025-01-08T10:12:00.447] debug: power_save module disabled, SuspendTime < 0 [2025-01-08T10:17:27.077] debug: power_save module disabled, SuspendTime < 0 [2025-01-08T10:20:06.735] debug: power_save module disabled, SuspendTime < 0 [2025-01-08T10:26:16.663] debug: power_save module disabled, SuspendTime < 0 [2025-01-08T10:26:19.017] debug: power_save module disabled, SuspendTime < 0 [2025-01-08T10:46:25.078] debug: power_save module disabled, SuspendTime < 0 [2025-01-08T15:58:54.080] debug: power_save module disabled, SuspendTime < 0 [2025-01-08T15:58:56.452] debug: power_save module disabled, SuspendTime < 0 [2025-01-09T00:08:01.993] debug: power_save module disabled, SuspendTime < 0 [2025-01-09T00:08:02.157] debug: power_save module disabled, SuspendTime < 0 [2025-01-09T00:14:26.817] debug: power_save module disabled, SuspendTime < 0 [2025-01-09T00:14:26.971] debug: power_save module disabled, SuspendTime < 0
We manage the infra through Puppet and I can confirm is SuspendTime=3600.
Slurm version:- slurm 24.05.2
# SLURM POWER SAVING FEATURS SuspendProgram=/etc/slurm/suspend_nodes_slurm.par ResumeProgram=/etc/slurm/resume_nodes_slurm.par SuspendTimeout=600 ResumeTimeout=900 ResumeRate=100 SuspendRate=100 SuspendTime=3600 SlurmctldParameters=enable_configless,cloud_dns,idle_on_node_suspend DebugFlags=Power