[slurm-users] [External] Power saving method selection for different kinds of hardware

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Mon Mar 27 18:32:22 UTC 2023


Hi Prentice,

Since the last message I figured out a way to implement power_save:

I've documented our setup in this Wiki page:
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_cloud_bursting/#configuring-slurm-conf-for-power-saving
This page contains a link to power_save scripts on GitHub.

Best regards,
Ole

On 27-03-2023 19:35, Prentice Bisbal wrote:
> I'm just catching up on old mailing list messages now. Why not make your 
> SuspendProgram and ResumePrograms be shell scripts that look at some 
> node information in Slurm (look at the features as in your example) or 
> some other source ( use a case statement based on node names) and call 
> the correct suspend/resume command based on that?
> 
> I agree that attaching this metadata in the node definition and have 
> slurm act on it directly is the best solution, but in the meantime, 
> having a shell script that can figure out the correct way to 
> suspend/resume each host should be very doable, if not ideal.
> 
> Prentice
> 
> On 11/8/22 09:36, Ole Holm Nielsen wrote:
>> I'm thinking about the best way to configure power saving (see 
>> https://slurm.schedmd.com/power_save.html) when we have different 
>> types of node hardware whose power state have to be managed differently:
>>
>> 1. Nodes with a BMC NIC interface where "ipmitool chassis power ..." 
>> commands can be used.
>>
>> 2. Nodes where the BMC cannot be used for powering up due to the 
>> shared NICs going down when the node is off :-(
>>
>> 3. Cloud nodes where special cloud CLI commands must be used (such as 
>> Azure CLI).
>>
>> The slurm.conf only permits one SuspendProgram and one ResumeProgram 
>> which then need to figure out the cases listed above and perform 
>> appropriate actions.
>>
>> I was thinking to add a node feature to indicate the kind of power 
>> control mechanism available, for example along these lines for the 3 
>> above cases:
>>
>> Nodename=node001 Feature=power_ipmi
>> Nodename=node002 Feature=power_none
>> Nodename=node003 Feature=power_azure
>>
>> The node feature might be inquired in the SuspendProgram and 
>> ResumeProgram and jump to separate branches of the script for power 
>> control commands.
>>
>> Question: Has anyone thought of a similar or better way to handle 
>> power saving for different types of nodes?
>>
>> Thanks,
>> Ole
>>
> 






More information about the slurm-users mailing list