[slurm-users] [External] Hibernating a whole cluster

Florian Zillner fzillner at lenovo.com
Mon Feb 6 20:07:48 UTC 2023


Hi,

follow this guide: https://slurm.schedmd.com/power_save.html

Create poweroff / poweron scripts and configure slurm to do the poweroff after X minutes. Works well for us. Make sure to set an appropriate time (ResumeTimeout) to allow the node to come back to service.
Note that we did not achieve good power saving with suspending the nodes, powering them off and on saves way more power. The downside is it takes ~ 5 mins to resume (= power on) the nodes when needed.

Cheers,
Florian
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Analabha Roy <hariseldon99 at gmail.com>
Sent: Monday, 6 February 2023 18:21
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: [External] [slurm-users] Hibernating a whole cluster

Hi,

I've just finished  setup of a single node "cluster" with slurm on ubuntu 20.04. Infrastructural limitations  prevent me from running it 24/7, and it's only powered on during business hours.


Currently, I have a cron job running that hibernates that sole node before closing time.

The hibernation is done with standard systemd, and hibernates to the swap partition.

 I have not run any lengthy slurm jobs on it yet. Before I do, can I get some thoughts on a couple of things?

If it hibernated when slurm still had jobs running/queued, would they resume properly when the machine powers back on?

Note that my swap space is bigger than my  RAM.

Is it necessary to perhaps setup a pre-hibernate script for systemd to  iterate scontrol to suspend all the jobs before hibernating and resume them post-resume?

What about the wall times? I'm uessing that slurm will count the downtime as elapsed for each job. Is there a way to config this, or is the only alternative a post-hibernate script that iteratively updates the wall times of the running jobs using scontrol again?

Thanks for your attention.
Regards
AR
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230206/6272ae65/attachment-0001.htm>


More information about the slurm-users mailing list