<div dir="ltr"><div><div><div><div><div> > Eventually the job aging makes the jobs so high-priority, <br><br></div>Guess I should look in the manual, but could you increase the job ageing time parameters?<br></div>I guess it is also worth saying that this is the scheduler doing its job - it is supposed to keep jobs ready and waiting to go, to keep the cluster busy!<br><br></div>I was going to suggest that you could have a cron job, which then looks at the jobs the 'queue stuffer' has and moves some of them down in priority.<br></div>This is a bad suggestion - in general writing a 'scheduler within a scheduler' is not a good thing and you only end up fighting the real scheduler.<br><br></div><div>I did have a similar situation on my last job - a user needed to get some work done, and submitted a huge amount of jobs.<br></div><div>It happened to be that there was a low load on the cluster at the time, so this user got a lot of job started. We finalyl had to temporarily limit the<br></div><div>maximum amount of jobs he could submit. Again if you think about it this is a good thing - we are operating batch queuing systems and this user <br></div><div>was putting it to good use.<br><br></div><div>The 'problem' is more related to the length of the job. If the 'queue stuffer' is submitting jobs with a long wallclock time then yes you will get complaints<br></div><div>from the other users. With shorter jobs there is more opportunity for other users to 'get a look in' as we say in Glasgow.<br></div><div><br></div><div><br></div><div>Actually what IS bad is users not putting cluster resources to good use. You can often see jobs which are 'stalled' - ie the nodes are reserved for the job,<br></div><div>but the internal logic of the job has failed and the executables have not launched. Or maybe some user is running an interactive job and has wandered<br></div><div>off for coffee/beer/an extended holiday. It is well worth scanning for stalled jobs and terminating them.<br></div><div><br></div><div><br><br></div><div><div><div><div><br></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 8 May 2018 at 09:25, Ole Holm Nielsen <span dir="ltr"><<a href="mailto:Ole.H.Nielsen@fysik.dtu.dk" target="_blank">Ole.H.Nielsen@fysik.dtu.dk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 05/08/2018 08:44 AM, Bjørn-Helge Mevik wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Jonathon A Anderson <<a href="mailto:jonathon.anderson@colorado.edu" target="_blank">jonathon.anderson@colorado.ed<wbr>u</a>> writes:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
## Queue stuffing<br>
</blockquote>
<br>
There is the bf_max_job_user SchedulerParameter, which is sort of the<br>
"poor man's MAXIJOB"; it limits the number of jobs from each user the<br>
backfiller will try to start on each run. It doesn't do exactly what<br>
you want, but at least the backfiller will not create reservations for<br>
_all_ the queue stuffer's jobs.<br>
</blockquote>
<br></span>
Adding to this I discuss backfilling configuration in<br>
<a href="https://wiki.fysik.dtu.dk/niflheim/Slurm_scheduler#scheduler-configuration" rel="noreferrer" target="_blank">https://wiki.fysik.dtu.dk/nifl<wbr>heim/Slurm_scheduler#scheduler<wbr>-configuration</a><br>
<br>
The MaxJobCount limit etc. is described in<br>
<a href="https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#maxjobcount-limit" rel="noreferrer" target="_blank">https://wiki.fysik.dtu.dk/nifl<wbr>heim/Slurm_configuration#maxjo<wbr>bcount-limit</a><span class="HOEnZb"><font color="#888888"><br>
<br>
/Ole<br>
<br>
</font></span></blockquote></div><br></div>