[slurm-users] ticking time bomb? launching too many jobs in parallel

Fri Aug 30 11:33:55 UTC 2019

It would still be possible to use job arrays in this situation, it's
just slightly messy.
So the way a job array works is that you submit a single script, and
that script is provided an integer for each subjob.  The integer is in
a range, with a possible step (default=1).

To run the situation you describe, you would have to predetermine how
many of each test you want to run (i.e., you coudln't dynamically
change the number of jobs that run within one array)., and a master
script would map the integer range to the job that was to be started.

The most trivial way to do it would be to put the list of regressions
in a text file and the master script would index it by line number and
then run the appropriate command.
A more complex way would be to do some math (a divide?) to get the
script name and subindex (modulus?) for each regression.

Both of these would require some semi-advanced scripting, but nothing
that couldn't be cut and pasted with some trivial modifications for
each job set.

As to the unavailability of the admin ...
An alternate approach that would require the admin's help would be to
come up with a small set of alocations (e.g., 40 gpus, 80 gpus, 100
gpus, etc.) and make a QOS for each one with a gpu limit (e.g.,
maxtrespu=gpu=40 ) Then the user would assign that QOS to the job when
starting it to set the overall allocation for all the jobs.  The admin
woudln't need to tweak this except once, you just pick which tweak to
use.

On Fri, Aug 30, 2019 at 2:36 AM Guillaume Perrault Archambault
<gperr050 at uottawa.ca> wrote:
>
> Hi Steven,
>
> Thanks for taking the time to reply to my post.
>
> Setting a limit on the number of jobs for a single array isn't sufficient because regression-tests need to launch multiple arrays, and I would need a job limit that would take effect over all launched jobs.
>
> It's very possible I'm not understand something. I'll lay out a very specific example in the hopes you can correct me if I've gone wrong somewhere.
>
> Let's take the small cluster with 140 GPUs and no fairshare as an example, because it's easier for me to explain.
>
> The users, who all know each other personally and interact via chat, decide on a daily basis how many jobs each user can run at a time.
>
> Let's say today is Sunday (hypothetically). Nobody is actively developing today, except that user 1 has 10 jobs running for the entire weekend. That leaves 130 GPUs unused.
>
> User 2, whose jobs all run on 1 GPU decides to run a regression test. The regression test comprises of 9 different scripts each run 40 times, for a grand total of 360 jobs. The duration of the scripts vary from 1 and 5 hours to complete, and the jobs take on average 4 hours to complete.
>
> User 2 gets the user group's approval (via chat) to use 90 GPUs (so that 40 GPUs will remain for anyone else wanting to work that day).
>
> The problem I'm trying to solve is this: how do I ensure that user 2 launches his 360 jobs in such a way that 90 jobs are in the run state consistently until the regression test is finished?
>
> Keep in mind that:
>
> limiting each job array to 10 jobs is inefficient: when the first job array finishes (long before the last one), only 80 GPUs will be used, and so on as other arrays finish
> the admin is not available, he cannot be asked to set a hard limit of 90 jobs for user 2 just for today
>
> I would be happy to use job arrays if they allow me to set an overarching job limit across multiple arrays. Perhaps this is doable. Admttedly I'm working on a paper to be submitted in a few days, so I don't have time to test jobs arrays thoroughly, but I will try out job arrays more thoroughly once I've submitted my paper (ie after sept 5).
>
> My solution, for now, is to not use job arrays. Instead, I launch each job individually, and I use singleton (by launching all jobs with the same 90 unique names) to ensure that exactly 90 jobs are run at a time (in this case, corresponding to 90 GPUs in use).
>
> Side note: the unavailability of the admin might sound contrived by picking Sunday as an example, but it's in fact very typical. The admin is not available:
>
> on weekends (the present example)
> at any time outside of 9am to 5pm (keep in mind, this is a cluster used by students in different time zones)
> any time he is on vacation
> anytime the he is looking after his many other responsibilities. Constantly setting user limits that change on a daily basis would be too much too ask.
>
>
> I'd be happy if you corrected my misunderstandings, especially if you could show me how to set a job limit that takes effect over multiple job arrays.
>
> I may have very glaring oversights as I don't necessarily have a big picture view of things (I've never been an admin, most notably), so feel free to poke holes at the way I've constructed things.
>
> Regards,
> Guillaume.
>
>
> On Fri, Aug 30, 2019 at 1:22 AM Steven Dick <kg4ydw at gmail.com> wrote:
>>
>> This makes no sense and seems backwards to me.
>>
>> When you submit an array job, you can specify how many jobs from the
>> array you want to run at once.
>> So, an administrator can create a QOS that explicitly limits the user.
>> However, you keep saying that they probably won't modify the system
>> for just you...
>>
>> That seems to me to be the perfect case to use array jobs and tell it
>> how many elements of the array to run at once.
>> You're not using array jobs for exactly the wrong reason.
>>
>> On Tue, Aug 27, 2019 at 1:19 PM Guillaume Perrault Archambault
>> <gperr050 at uottawa.ca> wrote:
>> > The reason I don't use job arrays is to be able limit the number of jobs per users
>>