<div dir="ltr"><div><div>Thanks Bill, I really appreciate the time you spent giving this detailed answer. </div><div>I will have a look at the plugin system as the integration with out accounting system would be a nice feature.<br></div></div><div><br></div><div>@<span style="font-size:12.8px">Chris thanks, I've had a look </span>GrpTRES but I'll probably go with the Spank route.</div><div><br></div><div>Best, </div><div>Matteo</div><div class="gmail_extra"><br><div class="gmail_quote">On 6 February 2018 at 13:58, Bill Barth <span dir="ltr"><<a href="mailto:bbarth@tacc.utexas.edu" target="_blank">bbarth@tacc.utexas.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Chris probably gives the Slurm-iest way to do this, but we use a Spank plugin that counts the jobs that a user has in queue (running and waiting) and sets a hard cap on how many they can have. This should probably be scaled to the size of the system and the partition they are submitting to, but on Stampede 2 (4200 KNL nodes and 1736 SKX nodes), we set this, across all queues to about 50, which has been our magic number, across numerous schedulers over the years on systems ranging from hundreds of nodes to Stamped2e 1 with 6400. Some users get more by request and most don’t even bump up against the limits. We’ve started to look at using TRES on our test system, but we haven’t gotten there yet. Our use of the DB is minimal, and our process to get every user into it when their TACC account is created is not 100% automated yet (we use the job completion plugin to create a flat file with job records which our local accounting system consumes to decrement allocation balances, if you care to know).<br>
<br>
Best,<br>
Bill.<br>
<span class="m_4530399368953462667gmail-m_7493636192774007441gmail-m_308855199775714190gmail-HOEnZb"><font color="#888888"><br>
--<br>
Bill Barth, Ph.D., Director, HPC<br>
<a href="mailto:bbarth@tacc.utexas.edu" target="_blank">bbarth@tacc.utexas.edu</a> | Phone: <a href="tel:%28512%29%20232-7069" value="+15122327069" target="_blank">(512) 232-7069</a><br>
Office: ROC 1.435 | Fax: <a href="tel:%28512%29%20475-9445" value="+15124759445" target="_blank">(512) 475-9445</a><br>
</font></span><div class="m_4530399368953462667gmail-m_7493636192774007441gmail-m_308855199775714190gmail-HOEnZb"><div class="m_4530399368953462667gmail-m_7493636192774007441gmail-m_308855199775714190gmail-h5"><br>
<br>
<br>
On 2/6/18, 6:03 AM, "slurm-users on behalf of Christopher Samuel" <<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank">slurm-users-bounces@lists.sch<wbr>edmd.com</a> on behalf of <a href="mailto:chris@csamuel.org" target="_blank">chris@csamuel.org</a>> wrote:<br>
<br>
On 06/02/18 21:40, Matteo F wrote:<br>
<br>
> I've tried to limit the number of running job using Qos -><br>
> MaxJobsPerAccount, but this wouldn't stop a user to just fill up the<br>
> cluster with fewer (but bigger) jobs.<br>
<br>
You probably want to look at what you can do with the slurmdbd database<br>
and associations. Things like GrpTRES:<br>
<br>
<a href="https://slurm.schedmd.com/sacctmgr.html" rel="noreferrer" target="_blank">https://slurm.schedmd.com/sacc<wbr>tmgr.html</a><br>
<br>
# GrpTRES=<TRES=max TRES,...><br>
# Maximum number of TRES running jobs are able to be allocated in<br>
# aggregate for this association and all associations which are children<br>
# of this association. To clear a previously set value use the modify<br>
# command with a new value of -1 for each TRES id.<br>
#<br>
# NOTE: This limit only applies fully when using the Select Consumable<br>
# Resource plugin.<br>
<br>
Best of luck,<br>
Chris<br>
<br>
<br>
<br>
</div></div></blockquote></div><br></div></div>