[slurm-users] (no subject)
benson_muite at emailplus.org
Thu Jul 28 16:43:51 UTC 2022
On 7/28/22 18:49, Djamil Lakhdar-Hamina wrote:
> I am helping set up a 16 node cluster computing system, I am not a
> system-admin but I work for a small firm and unfortunately have to pick
> up needed skills fast in things I have little experience in. I am
> running Rocky Linux 8 on Intel Xeon Knights Landings nodes donated by
> the TAAC center. We are operating in Uganda where we have limited
> resources and where power is quite expensive.
It may be helpful to check whether data center co-location is a
solution. Uganda generates a lot of hydro electric power, distribution
is what increases the cost.
> What are some good ways to implement power-saving ?
Do you have the exact specifications of the host chips and accelerators?
I have already tried
> power saving as per slurms power saving guide but 1) I am not quite sure
> what it does and 2) in implementing a version on my virtual dev
> environment I was able to get the power saving to stand down nodes, but
> I was not able to get the power saving mechanism to spin them back up
> when needed. I put power saving in the slurm.cfg file, and I also
> specified a SuspendProgram and a ResumeProgram similar to the one in the
> So 1) how do I get this power saving mechanism to work, what exactly
> will it do, I see it stands nodes down, will it spin them back up on
> request of those resources? 2) Are there any better techniques for power
> saving, say using IPMItool or something?
It may be helpful to measure power use directly for the most common
applications. You might also check if the system will be fully
utilized, and if not enable jobs to run at of peak times when energy
costs are lower.
Based on a price of $0.13 per kWh, full utilization 5 days a week, 8
hours a day, 52 weeks per year and 500Kw per node, electricity is about
$2000 per year.
> Djamil Lakhdar-Hamina
More information about the slurm-users