[slurm-users] Monitoring node power values: A new "showpower" tool

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Wed Jan 12 06:48:19 UTC 2022


Slurm can be configured to monitor node power and energy, see 
AcctGatherEnergyType in the slurm.conf manual page.  It is simple to 
enable the RAPL type acct_gather_energy/rapl, and I have written a 
section about this in my Wiki page[1].  I also recommend using the 
"turbostat" command for monitoring a single node.  The RAPL values 
monitor CPU+RAM power, not the entire node power.

The power and energy values can only be obtained from Slurm using 
"scontrol show node xxx", although it would be desirable to have this 
information also from sinfo (see Slurm bug 13083).

For a convenient print of power and energy values for a set of nodes, 
I've written a small script showpower[2].  I'd appreciate suggestions 
for improvements.

Question: Do other sites have experience with other AcctGatherEnergyType 
settings such as ipmi or external sensors?

Best regards,
Ole

[1] 
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#power-monitoring-and-management

[2] https://github.com/OleHolmNielsen/Slurm_tools/tree/master/nodes

-- 
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark



More information about the slurm-users mailing list