<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 12/19/19 10:44 AM, Ransom, Geoffrey
M. wrote:<br>
</div>
<blockquote type="cite"
cite="mid:623d0d1ef1214f53a81ccdbc4a41a89d@APLEX03.dom1.jhuapl.edu">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
span.EmailStyle19
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle20
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle21
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p><span style="color:#1F497D"> </span>The simplest is
probably to just have a separate partition that will only
allow job times of 1 hour or less.<span style="color:#1F497D"><o:p></o:p></span></p>
<p><span style="color:#1F497D">This is how our Univa queues used
to work, by overlapping the same hardware. Univa shows
available “slots” to the users and we had a lot of confused
users complaining about all those free slots (busy slots in
the other queue) while their jobs sat on the queue and new
users confused as to why their jobs were being killed after
4 hours. I was able to move the short/long behavior to job
classes and use RQSes and have one queue.<o:p></o:p></span></p>
<p><span style="color:#1F497D">While slurm isn’t showing users
unused resources I am concerned that going back to two
queues (partitions) will cause user interaction and adoption
problems.
<o:p></o:p></span></p>
<p> It all depends on what best suits the specific
needs.<o:p></o:p></p>
<p>Is there a way to have one partition that holds aside a small
percentage of resources for jobs with a runtime under 4 hours,
i.e. jobs with long runtimes cannot tie up 100% of the
resources at one time? Some kind of virtual partition that
feeds into two other partitions based on runtime would also
work. The goal is that users can continue to post jobs to one
partition but the scheduler won’t let 100% of the compute
resources get tied up with mutli-week long jobs.<o:p></o:p></p>
</div>
</blockquote>
<p>The way to do this is with Quality of Service (QOS) in Slurm.
When creating a QOS, you can specify the max. number of tasks a
QOS can use. Create a QOS for the longer running jobs and set the
MaxGrpTRES so that the number of CPUs is less that 100% of your
cluster. Create a QOS for the shorter jobs with a shorter time
limit (MaxWall). <br>
</p>
<p>Once the QOSes are setup, you can instruct your users to specify
the proper QOS when submitting a job, or edit the job_submit.lua
script to look at the time limit specified, and assign/override
the QOS based on that. <br>
</p>
<p>--<br>
Prentice<br>
</p>
</body>
</html>