<div dir="ltr"><div>Hi Sean,</div><div><br></div><div>Thanks for your awesome suggestion! I'm going through the reservation docs now. At first glance, it seems like a daily reservation would turn down jobs that are too big for the reservation. It'd be nice if </div><div>slurm could suspend (in the manner of 'scontrol suspend') jobs during reserved downtime and resume them after. That way, folks can submit large jobs without having to worry about the downtimes. Perhaps the FLEX option in reservations can accomplish this somehow?</div><div><br></div><div><br></div><div>I suppose that I can do it using a shell script iterator and a cron job, but that seems like an ugly hack. I was hoping if there is a way to config this in slurm itself?</div><div><br></div><div>AR</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 7 Feb 2023 at 16:06, Sean Mc Grath <<a href="mailto:smcgrat@tcd.ie">smcgrat@tcd.ie</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg-2766631841793518417">
<div dir="ltr">
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
Hi Analabha,</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
Could you do something like create a daily reservation for 8 hours that starts at 9am, or whatever times work for you like the following untested command:<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
scontrol create reservation starttime=09:00:00 duration=8:00:00 nodecnt=1 flags=daily ReservationName=daily
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
Daily option at <a href="https://slurm.schedmd.com/scontrol.html#OPT_DAILY" target="_blank">https://slurm.schedmd.com/scontrol.html#OPT_DAILY</a><br>
</div>
<div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Some more possible helpful documentation at <a href="https://slurm.schedmd.com/reservations.html" target="_blank">
https://slurm.schedmd.com/reservations.html</a>, search for "daily".</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
My idea being that jobs can only run in that reservation, (that would have to be configured separately, not sure how from the top of my head), which is only active during the times you want the node to be working. So the cronjob that hibernates/shuts it down
will do so when there are no jobs running. At least in theory.</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hope that helps.</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Sean<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_-2766631841793518417Signature">
<div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
---</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
Sean McGrath</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
Senior Systems Administrator, IT Services</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<br>
</div>
</div>
</div>
</div>
<div id="m_-2766631841793518417appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_-2766631841793518417divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> slurm-users <<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank">slurm-users-bounces@lists.schedmd.com</a>> on behalf of Analabha Roy <<a href="mailto:hariseldon99@gmail.com" target="_blank">hariseldon99@gmail.com</a>><br>
<b>Sent:</b> Tuesday 7 February 2023 10:05<br>
<b>To:</b> Slurm User Community List <<a href="mailto:slurm-users@lists.schedmd.com" target="_blank">slurm-users@lists.schedmd.com</a>><br>
<b>Subject:</b> Re: [slurm-users] [External] Hibernating a whole cluster</font>
<div> </div>
</div>
<div>
<div dir="ltr">Hi,
<div><br>
</div>
<div>Thanks. I had read the Slurm Power Saving Guide before. I believe the configs enable slurmctld to check other nodes for idleness and suspend/resume them. Slurmctld must run on a separate, always-on server for this to work, right?<br>
</div>
<div><br>
</div>
<div>My issue might be a little different. I literally have only one node that runs everything: slurmctld, slurmd, slurmdbd, everything.</div>
<div><br>
</div>
<div>This node must be set to "sudo systemctl hibernate"after business hours, regardless of whether jobs are queued or running. The next business day, it can be switched on manually.</div>
<div><br>
</div>
<div>systemctl hibernate is supposed to save the entire run state of the sole node to swap and poweroff. When powered on again, it should restore everything to its previous running state.<br>
</div>
<div><br>
</div>
<div>When the job queue is empty, this works well. I'm not sure how well this hibernate/resume will work with running jobs and would appreciate any suggestions or insights.</div>
<div><br>
</div>
<div>AR</div>
<div><br>
</div>
</div>
<br>
<div>
<div dir="ltr">On Tue, 7 Feb 2023 at 01:39, Florian Zillner <<a href="mailto:fzillner@lenovo.com" target="_blank">fzillner@lenovo.com</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
Hi,</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
follow this guide: <a href="https://slurm.schedmd.com/power_save.html" id="m_-2766631841793518417x_m_-5976585725710474458LPlnk920013" target="_blank">https://slurm.schedmd.com/power_save.html</a></div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
Create poweroff / poweron scripts and configure slurm to do the poweroff after X minutes. Works well for us. Make sure to set an appropriate time (ResumeTimeout) to allow the node to come back to service.</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
Note that we did not achieve good power saving with suspending the nodes, powering them off and on saves way more power. The downside is it takes ~ 5 mins to resume (= power on) the nodes when needed.</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
Cheers,</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
Florian</div>
<div id="m_-2766631841793518417x_m_-5976585725710474458appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_-2766631841793518417x_m_-5976585725710474458divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> slurm-users <<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank">slurm-users-bounces@lists.schedmd.com</a>>
on behalf of Analabha Roy <<a href="mailto:hariseldon99@gmail.com" target="_blank">hariseldon99@gmail.com</a>><br>
<b>Sent:</b> Monday, 6 February 2023 18:21<br>
<b>To:</b> <a href="mailto:slurm-users@lists.schedmd.com" target="_blank">slurm-users@lists.schedmd.com</a> <<a href="mailto:slurm-users@lists.schedmd.com" target="_blank">slurm-users@lists.schedmd.com</a>><br>
<b>Subject:</b> [External] [slurm-users] Hibernating a whole cluster</font>
<div> </div>
</div>
<div>
<div dir="auto">Hi,
<div dir="auto"><br>
</div>
<div dir="auto">I've just finished setup of a single node "cluster" with slurm on ubuntu 20.04. Infrastructural limitations prevent me from running it 24/7, and it's only powered on during business hours.</div>
<div dir="auto"><br>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">Currently, I have a cron job running that hibernates that sole node before closing time.</div>
<div dir="auto"><br>
</div>
<div dir="auto">The hibernation is done with standard systemd, and hibernates to the swap partition.</div>
<div dir="auto"><br>
</div>
<div dir="auto"> I have not run any lengthy slurm jobs on it yet. Before I do, can I get some thoughts on a couple of things?</div>
<div dir="auto"><br>
</div>
<div dir="auto">If it hibernated when slurm still had jobs running/queued, would they resume properly when the machine powers back on? </div>
<div dir="auto"><br>
</div>
<div dir="auto">Note that my swap space is bigger than my RAM. </div>
<div dir="auto"><br>
</div>
<div dir="auto">Is it necessary to perhaps setup a pre-hibernate script for systemd to iterate scontrol to suspend all the jobs before hibernating and resume them post-resume? </div>
<div dir="auto"><br>
</div>
<div dir="auto">What about the wall times? I'm uessing that slurm will count the downtime as elapsed for each job. Is there a way to config this, or is the only alternative a post-hibernate script that iteratively updates the wall times of the running jobs
using scontrol again? </div>
<div dir="auto"><br>
</div>
<div dir="auto">Thanks for your attention. </div>
<div dir="auto">Regards </div>
<div dir="auto">AR</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>Analabha Roy<br>
</div>
<div>Assistant Professor</div>
<div><a href="http://www.buruniv.ac.in/academics/department/physics" target="_blank">Department of Physics</a></div>
<div><a href="http://www.buruniv.ac.in/" target="_blank">The University of Burdwan</a></div>
<div>Golapbag Campus, Barddhaman 713104</div>
<div>West Bengal, India</div>
<div>Emails: <a href="mailto:daneel@utexas.edu" target="_blank">daneel@utexas.edu</a>,
<a href="mailto:aroy@phys.buruniv.ac.in" target="_blank">aroy@phys.buruniv.ac.in</a>,
<a href="mailto:hariseldon99@gmail.com" target="_blank">hariseldon99@gmail.com</a><br>
<div><font face="tahoma, sans-serif">Webpage: <a href="http://www.ph.utexas.edu/~daneel/" target="_blank">
http://www.ph.utexas.edu/~daneel/</a></font></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div></blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Analabha Roy<br></div><div>Assistant Professor</div><div><a href="http://www.buruniv.ac.in/academics/department/physics" target="_blank">Department of Physics</a></div><div><a href="http://www.buruniv.ac.in/" target="_blank">The University of Burdwan</a></div><div>Golapbag Campus, Barddhaman 713104</div><div>West Bengal, India</div><div>Emails: <a href="mailto:daneel@utexas.edu" target="_blank">daneel@utexas.edu</a>, <a href="mailto:aroy@phys.buruniv.ac.in" target="_blank">aroy@phys.buruniv.ac.in</a>, <a href="mailto:hariseldon99@gmail.com" target="_blank">hariseldon99@gmail.com</a><br><div><font face="tahoma, sans-serif">Webpage: <a href="http://www.ph.utexas.edu/~daneel/" target="_blank">http://www.ph.utexas.edu/~daneel/</a></font></div></div></div></div></div></div></div>