<div dir="ltr">Hi Steven,<div><br></div><div>Thanks for taking the time to reply to my post.</div><div><br></div><div>Setting a limit on the number of jobs for a single array isn't sufficient because regression-tests need to launch multiple arrays, and I would need a job limit that would take effect over all launched jobs.</div><div><br></div><div>It's very possible I'm not understand something. I'll lay out a very specific example in the hopes you can correct me if I've gone wrong somewhere.</div><div><br></div><div><div>Let's take the small cluster with 140 GPUs and no fairshare as an example, because it's easier for me to explain.</div></div><div><br></div><div>The users, who all know each other personally and interact via chat, decide on a daily basis how many jobs each user can run at a time.</div><div></div><div><br></div><div>Let's say today is Sunday (hypothetically). Nobody is actively developing today, except that user 1 has 10 jobs running for the entire weekend. That leaves 130 GPUs unused.<br></div><div><br></div><div>User 2, whose jobs all run on 1 GPU decides to run a regression test. The regression test comprises of 9 different scripts each run 40 times, for a grand total of 360 jobs. The duration of the scripts vary from 1 and 5 hours to complete, and the jobs take on average 4 hours to complete.</div><div><br></div><div>User 2 gets the user group's approval (via chat) to use 90 GPUs (so that 40 GPUs will remain for anyone else wanting to work that day).</div><div><br></div><div>The problem I'm trying to solve is this: how do I ensure that user 2 launches his 360 jobs in such a way that 90 jobs are in the run state consistently until the regression test is finished?</div><div><br></div><div>Keep in mind that:</div><div><ul><li>limiting each job array to 10 jobs is inefficient: when the first job array finishes (long before the last one), only 80 GPUs will be used, and so on as other arrays finish</li><li>the admin is not available, he cannot be asked to set a hard limit of 90 jobs for user 2 just for today</li></ul><div>I would be happy to use job arrays if they allow me to set an overarching job limit across multiple arrays. Perhaps this is doable. Admttedly I'm working on a paper to be submitted in a few days, so I don't have time to test jobs arrays thoroughly, but I will try out job arrays more thoroughly once I've submitted my paper (ie after sept 5). </div><div> <br></div></div><div>My solution, for now, is to not use job arrays. Instead, I launch each job individually, and I use singleton (by launching all jobs with the same 90 unique names) to ensure that exactly 90 jobs are run at a time (in this case, corresponding to 90 GPUs in use).</div><div><br></div><div>Side note: the unavailability of the admin might sound contrived by picking Sunday as an example, but it's in fact very typical. The admin is not available:</div><div><ul><li>on weekends (the present example)</li><li>at any time outside of 9am to 5pm (keep in mind, this is a cluster used by students in different time zones)</li><li>any time he is on vacation</li><li>anytime the he is looking after his many other responsibilities. Constantly setting user limits that change on a daily basis would be too much too ask.</li></ul></div><div><br></div><div>I'd be happy if you corrected my misunderstandings, especially if you could show me how to set a job limit that takes effect over multiple job arrays.</div><div><br></div><div>I may have very glaring oversights as I don't necessarily have a big picture view of things (I've never been an admin, most notably), so feel free to poke holes at the way I've constructed things.</div><div><br></div><div>Regards,</div><div>Guillaume.</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Aug 30, 2019 at 1:22 AM Steven Dick <<a href="mailto:kg4ydw@gmail.com">kg4ydw@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">This makes no sense and seems backwards to me.<br>
<br>
When you submit an array job, you can specify how many jobs from the<br>
array you want to run at once.<br>
So, an administrator can create a QOS that explicitly limits the user.<br>
However, you keep saying that they probably won't modify the system<br>
for just you...<br>
<br>
That seems to me to be the perfect case to use array jobs and tell it<br>
how many elements of the array to run at once.<br>
You're not using array jobs for exactly the wrong reason.<br>
<br>
On Tue, Aug 27, 2019 at 1:19 PM Guillaume Perrault Archambault<br>
<<a href="mailto:gperr050@uottawa.ca" target="_blank">gperr050@uottawa.ca</a>> wrote:<br>
> The reason I don't use job arrays is to be able limit the number of jobs per users<br>
<br>
</blockquote></div>