<div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 7 Feb 2023, 18:12 Diego Zuccato, <<a href="mailto:diego.zuccato@unibo.it">diego.zuccato@unibo.it</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">RAM used by a suspended job is not released. At most it can be swapped <br>
out (if enough swap is available).<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto">There should be enough swap available. I have 93 gigs of Ram and as big a swap partition. I can top it off with swap files if needed. </div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
<br>
Il 07/02/2023 13:14, Analabha Roy ha scritto:<br>
> Hi Sean,<br>
> <br>
> Thanks for your awesome suggestion! I'm going through the reservation <br>
> docs now. At first glance, it seems like a daily reservation would turn <br>
> down jobs that are too big for the reservation. It'd be nice if<br>
> slurm could suspend (in the manner of 'scontrol suspend') jobs during <br>
> reserved downtime and resume them after. That way, folks can submit <br>
> large jobs without having to worry about the downtimes. Perhaps the FLEX <br>
> option in reservations can accomplish this somehow?<br>
> <br>
> <br>
> I suppose that I can do it using a shell script iterator and a cron job, <br>
> but that seems like an ugly hack. I was hoping if there is a way to <br>
> config this in slurm itself?<br>
> <br>
> AR<br>
> <br>
> On Tue, 7 Feb 2023 at 16:06, Sean Mc Grath <<a href="mailto:smcgrat@tcd.ie" target="_blank" rel="noreferrer">smcgrat@tcd.ie</a> <br>
> <mailto:<a href="mailto:smcgrat@tcd.ie" target="_blank" rel="noreferrer">smcgrat@tcd.ie</a>>> wrote:<br>
> <br>
>     Hi Analabha,<br>
> <br>
>     Could you do something like create a daily reservation for 8 hours<br>
>     that starts at 9am, or whatever times work for you like the<br>
>     following untested command:<br>
> <br>
>     scontrol create reservation starttime=09:00:00 duration=8:00:00<br>
>     nodecnt=1 flags=daily ReservationName=daily<br>
> <br>
>     Daily option at <a href="https://slurm.schedmd.com/scontrol.html#OPT_DAILY" rel="noreferrer noreferrer" target="_blank">https://slurm.schedmd.com/scontrol.html#OPT_DAILY</a><br>
>     <<a href="https://slurm.schedmd.com/scontrol.html#OPT_DAILY" rel="noreferrer noreferrer" target="_blank">https://slurm.schedmd.com/scontrol.html#OPT_DAILY</a>><br>
> <br>
>     Some more possible helpful documentation at<br>
>     <a href="https://slurm.schedmd.com/reservations.html" rel="noreferrer noreferrer" target="_blank">https://slurm.schedmd.com/reservations.html</a><br>
>     <<a href="https://slurm.schedmd.com/reservations.html" rel="noreferrer noreferrer" target="_blank">https://slurm.schedmd.com/reservations.html</a>>, search for "daily".<br>
> <br>
>     My idea being that jobs can only run in that reservation, (that<br>
>     would have to be configured separately, not sure how from the top of<br>
>     my head), which is only active during the times you want the node to<br>
>     be working. So the cronjob that hibernates/shuts it down will do so<br>
>     when there are no jobs running. At least in theory.<br>
> <br>
>     Hope that helps.<br>
> <br>
>     Sean<br>
> <br>
>     ---<br>
>     Sean McGrath<br>
>     Senior Systems Administrator, IT Services<br>
> <br>
>     ------------------------------------------------------------------------<br>
>     *From:* slurm-users <<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank" rel="noreferrer">slurm-users-bounces@lists.schedmd.com</a><br>
>     <mailto:<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank" rel="noreferrer">slurm-users-bounces@lists.schedmd.com</a>>> on behalf of<br>
>     Analabha Roy <<a href="mailto:hariseldon99@gmail.com" target="_blank" rel="noreferrer">hariseldon99@gmail.com</a> <mailto:<a href="mailto:hariseldon99@gmail.com" target="_blank" rel="noreferrer">hariseldon99@gmail.com</a>>><br>
>     *Sent:* Tuesday 7 February 2023 10:05<br>
>     *To:* Slurm User Community List <<a href="mailto:slurm-users@lists.schedmd.com" target="_blank" rel="noreferrer">slurm-users@lists.schedmd.com</a><br>
>     <mailto:<a href="mailto:slurm-users@lists.schedmd.com" target="_blank" rel="noreferrer">slurm-users@lists.schedmd.com</a>>><br>
>     *Subject:* Re: [slurm-users] [External] Hibernating a whole cluster<br>
>     Hi,<br>
> <br>
>     Thanks. I had read the Slurm Power Saving Guide before. I believe<br>
>     the configs enable slurmctld to check other nodes for idleness and<br>
>     suspend/resume them. Slurmctld must run on a separate, always-on<br>
>     server for this to work, right?<br>
> <br>
>     My issue might be a little different. I literally have only one node<br>
>     that runs everything: slurmctld, slurmd, slurmdbd, everything.<br>
> <br>
>     This node must be set to "sudo systemctl hibernate"after business<br>
>     hours, regardless of whether jobs are queued or running. The next<br>
>     business day, it can be switched on manually.<br>
> <br>
>     systemctl hibernate is supposed to save the entire run state of the<br>
>     sole node to swap and poweroff. When powered on again, it should<br>
>     restore everything to its previous running state.<br>
> <br>
>     When the job queue is empty, this works well. I'm not sure how well<br>
>     this hibernate/resume will work with running jobs and would<br>
>     appreciate any suggestions or insights.<br>
> <br>
>     AR<br>
> <br>
> <br>
>     On Tue, 7 Feb 2023 at 01:39, Florian Zillner <<a href="mailto:fzillner@lenovo.com" target="_blank" rel="noreferrer">fzillner@lenovo.com</a><br>
>     <mailto:<a href="mailto:fzillner@lenovo.com" target="_blank" rel="noreferrer">fzillner@lenovo.com</a>>> wrote:<br>
> <br>
>         Hi,<br>
> <br>
>         follow this guide: <a href="https://slurm.schedmd.com/power_save.html" rel="noreferrer noreferrer" target="_blank">https://slurm.schedmd.com/power_save.html</a><br>
>         <<a href="https://slurm.schedmd.com/power_save.html" rel="noreferrer noreferrer" target="_blank">https://slurm.schedmd.com/power_save.html</a>><br>
> <br>
>         Create poweroff / poweron scripts and configure slurm to do the<br>
>         poweroff after X minutes. Works well for us. Make sure to set an<br>
>         appropriate time (ResumeTimeout) to allow the node to come back<br>
>         to service.<br>
>         Note that we did not achieve good power saving with suspending<br>
>         the nodes, powering them off and on saves way more power. The<br>
>         downside is it takes ~ 5 mins to resume (= power on) the nodes<br>
>         when needed.<br>
> <br>
>         Cheers,<br>
>         Florian<br>
>         ------------------------------------------------------------------------<br>
>         *From:* slurm-users <<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank" rel="noreferrer">slurm-users-bounces@lists.schedmd.com</a><br>
>         <mailto:<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank" rel="noreferrer">slurm-users-bounces@lists.schedmd.com</a>>> on behalf of<br>
>         Analabha Roy <<a href="mailto:hariseldon99@gmail.com" target="_blank" rel="noreferrer">hariseldon99@gmail.com</a><br>
>         <mailto:<a href="mailto:hariseldon99@gmail.com" target="_blank" rel="noreferrer">hariseldon99@gmail.com</a>>><br>
>         *Sent:* Monday, 6 February 2023 18:21<br>
>         *To:* <a href="mailto:slurm-users@lists.schedmd.com" target="_blank" rel="noreferrer">slurm-users@lists.schedmd.com</a><br>
>         <mailto:<a href="mailto:slurm-users@lists.schedmd.com" target="_blank" rel="noreferrer">slurm-users@lists.schedmd.com</a>><br>
>         <<a href="mailto:slurm-users@lists.schedmd.com" target="_blank" rel="noreferrer">slurm-users@lists.schedmd.com</a><br>
>         <mailto:<a href="mailto:slurm-users@lists.schedmd.com" target="_blank" rel="noreferrer">slurm-users@lists.schedmd.com</a>>><br>
>         *Subject:* [External] [slurm-users] Hibernating a whole cluster<br>
>         Hi,<br>
> <br>
>         I've just finished  setup of a single node "cluster" with slurm<br>
>         on ubuntu 20.04. Infrastructural limitations  prevent me from<br>
>         running it 24/7, and it's only powered on during business hours.<br>
> <br>
> <br>
>         Currently, I have a cron job running that hibernates that sole<br>
>         node before closing time.<br>
> <br>
>         The hibernation is done with standard systemd, and hibernates to<br>
>         the swap partition.<br>
> <br>
>           I have not run any lengthy slurm jobs on it yet. Before I do,<br>
>         can I get some thoughts on a couple of things?<br>
> <br>
>         If it hibernated when slurm still had jobs running/queued, would<br>
>         they resume properly when the machine powers back on?<br>
> <br>
>         Note that my swap space is bigger than my  RAM.<br>
> <br>
>         Is it necessary to perhaps setup a pre-hibernate script for<br>
>         systemd to  iterate scontrol to suspend all the jobs before<br>
>         hibernating and resume them post-resume?<br>
> <br>
>         What about the wall times? I'm uessing that slurm will count the<br>
>         downtime as elapsed for each job. Is there a way to config this,<br>
>         or is the only alternative a post-hibernate script that<br>
>         iteratively updates the wall times of the running jobs using<br>
>         scontrol again?<br>
> <br>
>         Thanks for your attention.<br>
>         Regards<br>
>         AR<br>
> <br>
> <br>
> <br>
>     -- <br>
>     Analabha Roy<br>
>     Assistant Professor<br>
>     Department of Physics<br>
>     <<a href="http://www.buruniv.ac.in/academics/department/physics" rel="noreferrer noreferrer" target="_blank">http://www.buruniv.ac.in/academics/department/physics</a>><br>
>     The University of Burdwan <<a href="http://www.buruniv.ac.in/" rel="noreferrer noreferrer" target="_blank">http://www.buruniv.ac.in/</a>><br>
>     Golapbag Campus, Barddhaman 713104<br>
>     West Bengal, India<br>
>     Emails: <a href="mailto:daneel@utexas.edu" target="_blank" rel="noreferrer">daneel@utexas.edu</a> <mailto:<a href="mailto:daneel@utexas.edu" target="_blank" rel="noreferrer">daneel@utexas.edu</a>>,<br>
>     <a href="mailto:aroy@phys.buruniv.ac.in" target="_blank" rel="noreferrer">aroy@phys.buruniv.ac.in</a> <mailto:<a href="mailto:aroy@phys.buruniv.ac.in" target="_blank" rel="noreferrer">aroy@phys.buruniv.ac.in</a>>,<br>
>     <a href="mailto:hariseldon99@gmail.com" target="_blank" rel="noreferrer">hariseldon99@gmail.com</a> <mailto:<a href="mailto:hariseldon99@gmail.com" target="_blank" rel="noreferrer">hariseldon99@gmail.com</a>><br>
>     Webpage: <a href="http://www.ph.utexas.edu/~daneel/" rel="noreferrer noreferrer" target="_blank">http://www.ph.utexas.edu/~daneel/</a><br>
>     <<a href="http://www.ph.utexas.edu/~daneel/" rel="noreferrer noreferrer" target="_blank">http://www.ph.utexas.edu/~daneel/</a>><br>
> <br>
> <br>
> <br>
> -- <br>
> Analabha Roy<br>
> Assistant Professor<br>
> Department of Physics <br>
> <<a href="http://www.buruniv.ac.in/academics/department/physics" rel="noreferrer noreferrer" target="_blank">http://www.buruniv.ac.in/academics/department/physics</a>><br>
> The University of Burdwan <<a href="http://www.buruniv.ac.in/" rel="noreferrer noreferrer" target="_blank">http://www.buruniv.ac.in/</a>><br>
> Golapbag Campus, Barddhaman 713104<br>
> West Bengal, India<br>
> Emails: <a href="mailto:daneel@utexas.edu" target="_blank" rel="noreferrer">daneel@utexas.edu</a> <mailto:<a href="mailto:daneel@utexas.edu" target="_blank" rel="noreferrer">daneel@utexas.edu</a>>, <br>
> <a href="mailto:aroy@phys.buruniv.ac.in" target="_blank" rel="noreferrer">aroy@phys.buruniv.ac.in</a> <mailto:<a href="mailto:aroy@phys.buruniv.ac.in" target="_blank" rel="noreferrer">aroy@phys.buruniv.ac.in</a>>, <br>
> <a href="mailto:hariseldon99@gmail.com" target="_blank" rel="noreferrer">hariseldon99@gmail.com</a> <mailto:<a href="mailto:hariseldon99@gmail.com" target="_blank" rel="noreferrer">hariseldon99@gmail.com</a>><br>
> Webpage: <a href="http://www.ph.utexas.edu/~daneel/" rel="noreferrer noreferrer" target="_blank">http://www.ph.utexas.edu/~daneel/</a> <br>
> <<a href="http://www.ph.utexas.edu/~daneel/" rel="noreferrer noreferrer" target="_blank">http://www.ph.utexas.edu/~daneel/</a>><br>
<br>
-- <br>
Diego Zuccato<br>
DIFA - Dip. di Fisica e Astronomia<br>
Servizi Informatici<br>
Alma Mater Studiorum - Università di Bologna<br>
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy<br>
tel.: +39 051 20 95786<br>
</blockquote></div></div></div>