<div dir="ltr"><div>The dual QoSes (or dual partition solution suggested by someone else) should both work in allow select users to submit jobs with longer run times. We use something like that on our cluster (though I confess it was our first Slurm cluster and we might have overdid it with QoSes causing scheduler to work harder). But for simple case you have, only downside I see is potential extra work in creating user associations, etc., which is not a problem if scripted.</div><div><br></div><div>I am not sure if either would work in extending run time of running jobs, though I expect it might be possible with QoS approach. (I think it is far more likely to be able to change QoS of a running job than the partition, even if both partitions consist of the same set of nodes). Also not sure if user can do that or if it would require sysadmin involvement. <br></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jul 3, 2019 at 11:52 AM David Baker <<a href="mailto:D.J.Baker@soton.ac.uk">D.J.Baker@soton.ac.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div id="gmail-m_-3700188595087077766divtagdefaultwrapper" dir="ltr" style="font-size:12pt;color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif,EmojiFont,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols">
<p style="margin-top:0px;margin-bottom:0px">Hello,</p>
<p style="margin-top:0px;margin-bottom:0px"><br>
</p>
<p style="margin-top:0px;margin-bottom:0px">A few of our users have asked about running longer jobs on our cluster. Currently our main/default compute partition has a time limit of 2.5 days. Potentially, a handful of users need jobs to run up to 5 hours. Rather
than allow all users/jobs to have a run time limit of 5 days I wondered if the following scheme makes sense...</p>
<p style="margin-top:0px;margin-bottom:0px"><br>
</p>
<p style="margin-top:0px;margin-bottom:0px">Increase the max run time on the default partition to be 5 days, however limit most users to a max of 2.5 days using the default "normal" QOS. </p>
<p style="margin-top:0px;margin-bottom:0px"><br>
</p>
<p style="margin-top:0px;margin-bottom:0px">Create a QOS called "long" with a max time limit of 5 days. Limit the user who can use "long". For authorized users assign "long" QOS to their jobs on basis of run time request. </p>
<p style="margin-top:0px;margin-bottom:0px"><br>
</p>
<p style="margin-top:0px;margin-bottom:0px">Does the above make sense or is it too complicated? If the above works could users limited to using the normal QOS have their running jobs run time increased to 5 days in exceptional circumstances?</p>
<p style="margin-top:0px;margin-bottom:0px"><br>
</p>
<p style="margin-top:0px;margin-bottom:0px">I would be interested in your thoughts, please.</p>
<p style="margin-top:0px;margin-bottom:0px"><br>
</p>
<p style="margin-top:0px;margin-bottom:0px">Best regards,</p>
<p style="margin-top:0px;margin-bottom:0px">David</p>
</div>
</div>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr">Tom Payerle <br>DIT-ACIGS/Mid-Atlantic Crossroads <a href="mailto:payerle@umd.edu" target="_blank">payerle@umd.edu</a><br></div><div>5825 University Research Park (301) 405-6135<br></div><div dir="ltr">University of Maryland<br>College Park, MD 20740-3831<br></div></div></div></div></div></div>