[slurm-users] Monitoring node power values: A new "showpower" tool
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Wed Jan 12 06:48:19 UTC 2022
Slurm can be configured to monitor node power and energy, see
AcctGatherEnergyType in the slurm.conf manual page. It is simple to
enable the RAPL type acct_gather_energy/rapl, and I have written a
section about this in my Wiki page[1]. I also recommend using the
"turbostat" command for monitoring a single node. The RAPL values
monitor CPU+RAM power, not the entire node power.
The power and energy values can only be obtained from Slurm using
"scontrol show node xxx", although it would be desirable to have this
information also from sinfo (see Slurm bug 13083).
For a convenient print of power and energy values for a set of nodes,
I've written a small script showpower[2]. I'd appreciate suggestions
for improvements.
Question: Do other sites have experience with other AcctGatherEnergyType
settings such as ipmi or external sensors?
Best regards,
Ole
[1]
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#power-monitoring-and-management
[2] https://github.com/OleHolmNielsen/Slurm_tools/tree/master/nodes
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark
More information about the slurm-users
mailing list