[slurm-users] (no subject)

Djamil Lakhdar-Hamina dl2774 at columbia.edu
Thu Jul 28 15:49:49 UTC 2022


I am helping set up a 16 node cluster computing system, I am not a
system-admin but I work for a small firm and unfortunately have to pick up
needed skills fast in things I have little experience in. I am running
Rocky Linux 8 on Intel Xeon Knights Landings nodes donated by the TAAC
center. We are operating in Uganda where we have limited resources and
where power is quite expensive.

What are some good ways to implement power-saving ? I have already tried
power saving as per slurms power saving guide but 1) I am not quite sure
what it does and 2) in implementing a version on my virtual dev environment
I was able to get the power saving to stand down nodes, but I was not able
to get the power saving mechanism to spin them back up when needed. I put
power saving in the slurm.cfg file, and I also specified a SuspendProgram
and a ResumeProgram similar to the one in the
https://slurm.schedmd.com/power_save.html.

So 1) how do I get this power saving mechanism to work, what exactly will
it do, I see it stands nodes down, will it spin them back up on request of
those resources? 2) Are there any better techniques for power saving, say
using IPMItool or something?

Sincerely,
Djamil Lakhdar-Hamina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220728/228c1146/attachment-0001.htm>


More information about the slurm-users mailing list