[slurm-users] Hibernating a whole cluster

Analabha Roy hariseldon99 at gmail.com
Mon Feb 6 17:21:55 UTC 2023


Hi,

I've just finished  setup of a single node "cluster" with slurm on ubuntu
20.04. Infrastructural limitations  prevent me from running it 24/7, and
it's only powered on during business hours.


Currently, I have a cron job running that hibernates that sole node before
closing time.

The hibernation is done with standard systemd, and hibernates to the swap
partition.

 I have not run any lengthy slurm jobs on it yet. Before I do, can I get
some thoughts on a couple of things?

If it hibernated when slurm still had jobs running/queued, would they
resume properly when the machine powers back on?

Note that my swap space is bigger than my  RAM.

Is it necessary to perhaps setup a pre-hibernate script for systemd to
iterate scontrol to suspend all the jobs before hibernating and resume them
post-resume?

What about the wall times? I'm uessing that slurm will count the downtime
as elapsed for each job. Is there a way to config this, or is the only
alternative a post-hibernate script that iteratively updates the wall times
of the running jobs using scontrol again?

Thanks for your attention.
Regards
AR
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230206/25b1b69f/attachment.htm>


More information about the slurm-users mailing list