<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><blockquote type="cite" class=""><div class="">On Jun 27, 2023, at 1:10 AM, Loris Bennett <<a href="mailto:loris.bennett@fu-berlin.de" class="">loris.bennett@fu-berlin.de</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><span class="" style="float: none; display: inline !important;">Hi Reed,</span><br class=""><br class=""><span class="" style="float: none; display: inline !important;">Reed Dier <</span><a href="mailto:reed.dier@focusvq.com" class="">reed.dier@focusvq.com</a><span class="" style="float: none; display: inline !important;">> writes:</span><br class=""><br class=""><blockquote type="cite" class="">Is this an issue with the relative FIFO nature of the priority scheduling currently with all of the other factors disabled,<br class="">or since my queue is fairly deep, is this due to bf_max_job_test being<br class="">the default 100, and it can’t look deep enough into the queue to find<br class="">a job that will fit into what is unoccupied?<br class=""></blockquote><br class=""><span class="" style="float: none; display: inline !important;">It could be that bf_max_job_test is too low.  On our system some users</span><br class=""><span class="" style="float: none; display: inline !important;">think it is a good idea to submit lots of jobs with identical resource</span><br class=""><span class="" style="float: none; display: inline !important;">requirements by writing a loop around sbatch.  Such jobs will exhaust</span><br class=""><span class="" style="float: none; display: inline !important;">the bf_max_job_test very quickly.  Thus we increased the limit to 1000</span><br class=""><span class="" style="float: none; display: inline !important;">and try to persuade users to use job arrays instead of home-grown loops.</span><br class=""><span class="" style="float: none; display: inline !important;">This seem to work OK[1].</span><br class=""><br class=""><span class="" style="float: none; display: inline !important;">Cheers,</span><br class=""><br class=""><span class="" style="float: none; display: inline !important;">Loris</span><br class=""><br class=""><span class="" style="float: none; display: inline !important;">-- </span><br class=""><span class="" style="float: none; display: inline !important;">Dr. Loris Bennett (Herr/Mr)</span><br class=""><span class="" style="float: none; display: inline !important;">ZEDAT, Freie Universität Berlin</span></div></blockquote></div><div class=""><br class=""></div>Thanks Loris,<div class="">I think this will be the next knob to turn and gives a bit more confidence to that, as we too have many such identical jobs.<br class=""><div><br class=""></div><blockquote type="cite" class=""><div class="">On Jun 26, 2023, at 9:10 PM, Brian Andrus <<a href="mailto:toomuchit@gmail.com" class="">toomuchit@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">Reed,<br class=""><br class="">You may want to look at the timelimit aspect of the job(s).<br class=""><br class="">For one to 'squeeze in', it needs to be able to finish before the resources in use are expected to become available.<br class=""><br class="">Consider:<br class="">Job A is running on 2 nodes of a 3 node cluster. It will finish in 1 hour.<br class="">Pending job B will run for 2 hours needs 2 nodes, but only 1 is free, it waits.<br class="">Pending job C (with a lower priority) needs 1 node for 2 hours. Hmm, well it won't finish before the time job B is expected to start, so it waits.<br class="">Pending job D (with even lower priority) needs 1 node for 30 minutes. That can squeeze in before the additional node for Job B is expected to be available, so it runs on the idle node.<br class=""><br class="">Brian Andrus</div></blockquote></div><div class=""><br class=""></div><div class="">Thanks Brian,</div><div class=""><br class=""></div><div class="">Our layout is a bit less exciting, in that none of these are >1 node per job.</div><div class="">So the blocking out nodes for job:node Tetris isn’t really at play here.</div><div class="">The timing however is something I may turn an eye towards.</div><div class="">Most jobs have a “sanity” time limit applied, in that it is not so much an <i class="">expected</i> time limit, but rather an “if it goes this long, something obviously went awry and we shouldn’t keep holding on to resources” limit.</div><div class="">So its a bit hard to quantify the timing portion, but I haven’t looked into the slurm guesses of when it thinks the next task will start, etc.</div><div class=""><br class=""></div><div class="">The pretty simplistic example at play here is that there are nodes that are ~50-60% loaded for CPU and memory.</div><div class="">The next job up is a “whale” job that wants a ton of resources, cpu and/or memory, but down the line there is a job with 2 cpu’s and 2 gb of memory that can easily slot in to the unused resources.</div><div class=""><br class=""></div><div class="">So my thinking was that the job_test list may be too short to actually get that far down the queue to see that it could shove that job into some holes.</div><div class=""><br class=""></div><div class="">I’ll report back any findings after testing Loris’s suggestions.</div><div class=""><br class=""></div><div class="">Appreciate everyone’s help and suggestions,</div><div class="">Reed</div></body></html>