[slurm-users] Power saving method selection for different kinds of hardware

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Tue Nov 8 14:36:14 UTC 2022


I'm thinking about the best way to configure power saving (see 
https://slurm.schedmd.com/power_save.html) when we have different types of 
node hardware whose power state have to be managed differently:

1. Nodes with a BMC NIC interface where "ipmitool chassis power ..." 
commands can be used.

2. Nodes where the BMC cannot be used for powering up due to the shared 
NICs going down when the node is off :-(

3. Cloud nodes where special cloud CLI commands must be used (such as 
Azure CLI).

The slurm.conf only permits one SuspendProgram and one ResumeProgram which 
then need to figure out the cases listed above and perform appropriate 
actions.

I was thinking to add a node feature to indicate the kind of power control 
mechanism available, for example along these lines for the 3 above cases:

Nodename=node001 Feature=power_ipmi
Nodename=node002 Feature=power_none
Nodename=node003 Feature=power_azure

The node feature might be inquired in the SuspendProgram and ResumeProgram 
and jump to separate branches of the script for power control commands.

Question: Has anyone thought of a similar or better way to handle power 
saving for different types of nodes?

Thanks,
Ole

-- 
Ole Holm Nielsen
Department of Physics, Technical University of Denmark,




More information about the slurm-users mailing list