[slurm-users] Slurm-power-management

Werf, C.G. van der (Carel) C.G.vanderWerf at uu.nl
Sat Jun 17 07:06:20 UTC 2023

I have an HPC cluster with 9 computenodes, controlled by slurm, all of them are identical and in the same partition.

I have set up Slurm' Power Saving Mode, which works quite well.
An idle node will be shut down after being idle for 30 minutes, and resumed on demand.

Load of complete cluster is not that high in some periods of time, but the side effect is that always NODE01 will be restarted first, and actually NODE08 will almost never be restarted.
(I started registering the amount of resume-processes about 8 months ago... Node01 was restarted 7 times as much as node08; node02 5 times as much etc...)

I am looking for a method to somehow "randomize" the resume schedule. 

Does anyone have an idea on how to establish this ?

With Regards,

| Carel van der Werf | 
| Developer/Administrator Linux | ICT-Bèta | Department of Science     | Utrecht University |

More information about the slurm-users mailing list