<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body>
<p>Yeah, I imagine that this varies also depending on average length
of job. In our case we do about a job per core per day. So the
double the number of cores works out well for us. However if you
have higher turn over a high number of jobs permitted is wiser.</p>
<p><br>
</p>
<p>We have a ton of partitions (130 as of last count) so our tuning
has been a bit more complicated. However the latest version of
slurm (20.02) vastly improved the backfill efficiency which has
helped with making sure the cluster is full. Nonetheless we still
seem to average a job per core per day here.</p>
<p><br>
</p>
<p>-Paul Edmon-</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 8/10/2020 4:43 PM, Sebastian T Smith
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:BYAPR01MB46772C502917C8CCC5652D02A5440@BYAPR01MB4677.prod.exchangelabs.com">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
<div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0); background-color:
rgb(255, 255, 255);">
My rule of thumb for our cluster is 1,024 jobs/node. Our
nodes have 32 cores, so we're 32x core count (converting to
Paul's units). We have 120 nodes with a maximum of 122,880
jobs.
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0); background-color:
rgb(255, 255, 255);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0); background-color:
rgb(255, 255, 255);">
At a high-level, nodes are allocated to different partitions
and each partition is allocated a maximum number of jobs equal
to 1024 * num_nodes (reality isn't quite this simple). Our
largest partition features 54,272 max jobs (53 nodes). I've
seen this maxed out a number of times with a large number of
very short jobs, and with job arrays.
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0); background-color:
rgb(255, 255, 255);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0); background-color:
rgb(255, 255, 255);">
This setup has required a bit of tuning. Adjusting
sched_max_job_start and sched_min_interval has been sufficient
to keep Slurm responsive when users are submitting or
cancelling a large number of jobs. Backfill tuning has been
difficult because we made some poor decisions setting
DefaultTime at the epoch of our system. Overall performance
has been excellent after minimal tuning.<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0); background-color:
rgb(255, 255, 255);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0); background-color:
rgb(255, 255, 255);">
- Sebastian<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0); background-color:
rgb(255, 255, 255);">
<br>
</div>
<div id="Signature">
<div>
<div id="divtagdefaultwrapper" dir="ltr" style="font-size:
12pt; color: rgb(0, 0, 0); font-family: Calibri, Arial,
Helvetica, sans-serif;">
<div style="font-family:Tahoma; font-size:13px">
<div style="font-family:Tahoma; font-size:13px">
<div class="BodyFragment"><font size="2"><span
style="font-size:10pt">
<div class="PlainText">
<div id="emailSignature" style="display:block">--</div>
<div id="emailSignature" style="display:block"><br>
<table style="box-sizing: border-box;
border-collapse: collapse; border-spacing:
0px; background-color: rgb(255, 255, 255);
max-width: 100%; text-align: start;
font-family: Helvetica, Arial, san-serif;
width: 480px;">
<tbody style="box-sizing:border-box">
<tr style="box-sizing:border-box">
<td style="box-sizing:border-box;
vertical-align:top" width="75"><a
href="http://www.unr.edu/"
style="box-sizing: border-box;
background: transparent none
repeat scroll 0% 0%; color:
rgb(46, 108, 162);
text-decoration: underline;"
moz-do-not-send="true"><img
alt="University of Nevada, Reno"
style="box-sizing:border-box;
vertical-align:middle"
src="https://www.unr.edu/assets/images/blockn-100x100.png"
moz-do-not-send="true"
width="75"></a></td>
<td style="box-sizing:border-box"
width="20"> </td>
<td style="box-sizing:border-box;
vertical-align:top" width="385">
<table style="box-sizing:border-box;
border-collapse:collapse;
max-width:100%; width:385px;
text-align:left">
<tbody
style="box-sizing:border-box">
<tr
style="box-sizing:border-box">
<td
style="box-sizing:border-box;
vertical-align:top"><span
style="box-sizing:border-box"><span
style="box-sizing:
border-box; color:
rgb(0, 51, 102);"><strong
style="box-sizing:border-box; font-weight:bold"><span
style="box-sizing:border-box">Sebastian
Smith<br
style="box-sizing:border-box">
</span></strong></span><span
style="box-sizing:border-box"><span style="box-sizing:border-box">High-Performance
Computing Engineer<br
style="box-sizing:border-box">
</span><span
style="box-sizing:border-box">Office
of Information
Technology<br
style="box-sizing:border-box">
</span><span
style="box-sizing:border-box">1664
North Virginia Street<br
style="box-sizing:border-box">
</span><span
style="box-sizing:border-box">MS
0291<br
style="box-sizing:border-box">
</span><span
style="box-sizing:border-box"></span><span
style="box-sizing:border-box"></span><br style="box-sizing:border-box">
<strong
style="box-sizing:border-box;
font-weight:bold">work-phone:<span> </span></strong><span
style="box-sizing:border-box"><a href="tel:7756825050"
style="box-sizing:
border-box;
background:
transparent none
repeat scroll 0% 0%;
text-decoration:
underline;"
moz-do-not-send="true">775-682-5050</a><br
style="box-sizing:border-box">
</span><strong
style="box-sizing:border-box;
font-weight:bold"></strong><span
style="box-sizing:border-box"></span><strong
style="box-sizing:border-box;
font-weight:bold">email:<span> </span></strong><span
style="box-sizing:border-box"><a href="mailto:stsmith@unr.edu"
style="box-sizing:
border-box;
background:
transparent none
repeat scroll 0% 0%;
text-decoration:
underline;"
moz-do-not-send="true">stsmith@unr.edu</a><br
style="box-sizing:border-box">
</span><strong
style="box-sizing:border-box;
font-weight:bold">website:<span> </span></strong><span
style="box-sizing:border-box"><a href="http://rc.unr.edu/"
style="box-sizing:
border-box;
background:
transparent none
repeat scroll 0% 0%;
text-decoration:
underline;"
moz-do-not-send="true">http://rc.unr.edu</a><br
style="box-sizing:border-box">
</span></span><br
style="box-sizing:border-box">
</span></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
</div>
</span></font></div>
</div>
</div>
</div>
</div>
</div>
</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"
face="Calibri, sans-serif" color="#000000"><b>From:</b>
slurm-users <a class="moz-txt-link-rfc2396E" href="mailto:slurm-users-bounces@lists.schedmd.com"><slurm-users-bounces@lists.schedmd.com></a> on
behalf of Paul Edmon <a class="moz-txt-link-rfc2396E" href="mailto:pedmon@cfa.harvard.edu"><pedmon@cfa.harvard.edu></a><br>
<b>Sent:</b> Friday, August 7, 2020 6:22 AM<br>
<b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:slurm-users@lists.schedmd.com">slurm-users@lists.schedmd.com</a>
<a class="moz-txt-link-rfc2396E" href="mailto:slurm-users@lists.schedmd.com"><slurm-users@lists.schedmd.com></a><br>
<b>Subject:</b> Re: [slurm-users] Tuning MaxJobs and
MaxJobsSubmit per user and for the whole cluster?</font>
<div> </div>
</div>
<div>
<p>My rule of thumb is that the MaxJobs for the entire cluster
is twice the number of cores you have available. That way you
have enough jobs running to fill all the cores and enough jobs
pending to refill them.</p>
<p><br>
</p>
<p>As for per user MaxJobs, it just depends on what you think
the maximum number any user can run with out causing damage to
themselves, the underlying filesystems, and interfering with
other users. Practical experience has lead to us setting that
limit to be 10,000 on our cluster, but I imagine it will vary
from location to location.</p>
<p><br>
</p>
<p>-Paul Edmon-</p>
<p><br>
</p>
<div class="x_moz-cite-prefix">On 8/6/2020 10:31 PM, Hoyle, Alan
P wrote:<br>
</div>
<blockquote type="cite">
<style type="text/css" style="display:none">
<!--
p
{margin-top:0;
margin-bottom:0}
-->
</style>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0);
background-color:rgb(255,255,255)">
<span style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt">I can't find any advice online about how
to tune things like MaxJobs on a per-cluster or per-user
basis. </span><br>
</div>
<div>
<div dir="ltr">
<div
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0);
background-color:rgb(255,255,255)">
<br>
</div>
<div
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0);
background-color:rgb(255,255,255)">
As far as I can tell, it seems that the default install
cluster MaxJobs seems to be 10,000 and MaxSubmit as the
same. Those seem pretty low to me: are there resources
that get consumed if maxSubmit is much higher or can we
raise this without much worry? </div>
<div
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0);
background-color:rgb(255,255,255)">
<br>
</div>
<div
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0);
background-color:rgb(255,255,255)">
Is there advice anywhere about tuning these? When I
google, all I can find are the generic "here's how to
change this" and various universities' documentation of
"here are the limits we have set." <br>
</div>
<div
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0);
background-color:rgb(255,255,255)">
<br>
</div>
<div
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0);
background-color:rgb(255,255,255)">
-alan</div>
<div>
<div
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div id="x_x_Signature">
<div>
<div id="x_x_divtagdefaultwrapper" dir="ltr"
style="font-size:12pt;
font-family:Calibri,Helvetica,sans-serif;
color:rgb(0,0,0)">
<p style="margin-top:0px; margin-bottom:0px;
margin-top:0px; margin-bottom:0px; margin-top:0;
margin-bottom:0">
</p>
<div>--</div>
<div>Alan Hoyle - <a
class="x_moz-txt-link-abbreviated"
href="mailto:alanh@unc.edu"
moz-do-not-send="true">
alanh@unc.edu</a></div>
<div>Bioinformatics Scientist</div>
<div>UNC Lineberger - Bioinformatics Core</div>
<div><a class="x_moz-txt-link-freetext"
href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flbc.unc.edu%2F&data=01%7C01%7Cstsmith%40unr.edu%7C65c3e7b5a3eb4f82ee9408d83ad54dd4%7C523b4bfc0ebd4c03b2b96f6a17fd31d8%7C1&sdata=nzdvT9uLS3iYCzs5Tm2HkifSzcRvVjIFosqLdf7Iafk%3D&reserved=0"
originalsrc="https://lbc.unc.edu/"
shash="GSRUWORW0Zrx2lWJ0huYv+/hMA4BXAp4VTOJJnYaFkvOTnQJoa3+dQ8wJi5RaehNt+g/TEKbHojVi//52ilDKMQMaewBZgxEGIX180e2Oc43aN5HclYikY+9qq0rzp2UzjunwmJpA0EXnYc2LsNdCvZnPKMNakhVpnOt0j/dfqQ="
moz-do-not-send="true">https://lbc.unc.edu/</a></div>
<br>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</body>
</html>