[slurm-users] Hibernating a whole cluster
Analabha Roy
hariseldon99 at gmail.com
Mon Feb 6 17:21:55 UTC 2023
Hi,
I've just finished setup of a single node "cluster" with slurm on ubuntu
20.04. Infrastructural limitations prevent me from running it 24/7, and
it's only powered on during business hours.
Currently, I have a cron job running that hibernates that sole node before
closing time.
The hibernation is done with standard systemd, and hibernates to the swap
partition.
I have not run any lengthy slurm jobs on it yet. Before I do, can I get
some thoughts on a couple of things?
If it hibernated when slurm still had jobs running/queued, would they
resume properly when the machine powers back on?
Note that my swap space is bigger than my RAM.
Is it necessary to perhaps setup a pre-hibernate script for systemd to
iterate scontrol to suspend all the jobs before hibernating and resume them
post-resume?
What about the wall times? I'm uessing that slurm will count the downtime
as elapsed for each job. Is there a way to config this, or is the only
alternative a post-hibernate script that iteratively updates the wall times
of the running jobs using scontrol again?
Thanks for your attention.
Regards
AR
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230206/25b1b69f/attachment.htm>
More information about the slurm-users
mailing list