[slurm-users] ticking time bomb? launching too many jobs in parallel

Fri Aug 30 18:55:23 UTC 2019

Hi Steven,

Those both sound like potentially good solutions.

So basically, you're saying that if I script it properly, I can use a
single job array to launch multiple scripts by using a master sbatch script.

My problem with that though, is what if each script (the 9 scripts in my
earlier example) each require different requirements? For example, run on a
different partition, or set a different time limit? My understanding is
that for a single job array, each job will get the same job requirements.

The other problem is that with the way I've implemented it, I can change
the max jobs dynamically.

I'll illustrate this using my earlier example. Suppose user 2 launches his
360 jobs with a 90 job limit (leaving 40 unused GPUs), and then user 3
realizes he needs to use 45 GPUs.

User 2 decides to drop his usage to 45 max jobs.

He can simply change the names of his pending singleton jobs to have 45
unique names, so that he will reduce his max jobs to 45 instead of 90 (I
wrote a script to do that, so it's a one liner for user 2)

Can the max job limit be modified after submission time using one big job
array?

In the docs it gives the '%' separator to limit the concurrent number of
jobs "--array=0-15%4" I could be wrong, but this sounds like a submit
time-only option that cannot be change after submission.

I also kindof like the varoius QOS for different job limits. I'm not sure
I'll be able to get the admin on board, but I'll bring it up. Even if I do
get them on board, will I have the same problem of locking the max limit at
submit time?

Can you change the QOS of a job when it's still pending?

Thanks a lot for your help!

Regards,
Guillaume

On Fri, Aug 30, 2019 at 7:36 AM Steven Dick <kg4ydw at gmail.com> wrote:

> It would still be possible to use job arrays in this situation, it's
> just slightly messy.
> So the way a job array works is that you submit a single script, and
> that script is provided an integer for each subjob.  The integer is in
> a range, with a possible step (default=1).
>
> To run the situation you describe, you would have to predetermine how
> many of each test you want to run (i.e., you coudln't dynamically
> change the number of jobs that run within one array)., and a master
> script would map the integer range to the job that was to be started.
>
> The most trivial way to do it would be to put the list of regressions
> in a text file and the master script would index it by line number and
> then run the appropriate command.
> A more complex way would be to do some math (a divide?) to get the
> script name and subindex (modulus?) for each regression.
>
> Both of these would require some semi-advanced scripting, but nothing
> that couldn't be cut and pasted with some trivial modifications for
> each job set.
>
> As to the unavailability of the admin ...
> An alternate approach that would require the admin's help would be to
> come up with a small set of alocations (e.g., 40 gpus, 80 gpus, 100
> gpus, etc.) and make a QOS for each one with a gpu limit (e.g.,
> maxtrespu=gpu=40 ) Then the user would assign that QOS to the job when
> starting it to set the overall allocation for all the jobs.  The admin
> woudln't need to tweak this except once, you just pick which tweak to
> use.
>
> On Fri, Aug 30, 2019 at 2:36 AM Guillaume Perrault Archambault
> <gperr050 at uottawa.ca> wrote:
> >
> > Hi Steven,
> >
> > Thanks for taking the time to reply to my post.
> >
> > Setting a limit on the number of jobs for a single array isn't
> sufficient because regression-tests need to launch multiple arrays, and I
> would need a job limit that would take effect over all launched jobs.
> >
> > It's very possible I'm not understand something. I'll lay out a very
> specific example in the hopes you can correct me if I've gone wrong
> somewhere.
> >
> > Let's take the small cluster with 140 GPUs and no fairshare as an
> example, because it's easier for me to explain.
> >
> > The users, who all know each other personally and interact via chat,
> decide on a daily basis how many jobs each user can run at a time.
> >
> > Let's say today is Sunday (hypothetically). Nobody is actively
> developing today, except that user 1 has 10 jobs running for the entire
> weekend. That leaves 130 GPUs unused.
> >
> > User 2, whose jobs all run on 1 GPU decides to run a regression test.
> The regression test comprises of 9 different scripts each run 40 times, for
> a grand total of 360 jobs. The duration of the scripts vary from 1 and 5
> hours to complete, and the jobs take on average 4 hours to complete.
> >
> > User 2 gets the user group's approval (via chat) to use 90 GPUs (so that
> 40 GPUs will remain for anyone else wanting to work that day).
> >
> > The problem I'm trying to solve is this: how do I ensure that user 2
> launches his 360 jobs in such a way that 90 jobs are in the run state
> consistently until the regression test is finished?
> >
> > Keep in mind that:
> >
> > limiting each job array to 10 jobs is inefficient: when the first job
> array finishes (long before the last one), only 80 GPUs will be used, and
> so on as other arrays finish
> > the admin is not available, he cannot be asked to set a hard limit of 90
> jobs for user 2 just for today
> >
> > I would be happy to use job arrays if they allow me to set an
> overarching job limit across multiple arrays. Perhaps this is doable.
> Admttedly I'm working on a paper to be submitted in a few days, so I don't
> have time to test jobs arrays thoroughly, but I will try out job arrays
> more thoroughly once I've submitted my paper (ie after sept 5).
> >
> > My solution, for now, is to not use job arrays. Instead, I launch each
> job individually, and I use singleton (by launching all jobs with the same
> 90 unique names) to ensure that exactly 90 jobs are run at a time (in this
> case, corresponding to 90 GPUs in use).
> >
> > Side note: the unavailability of the admin might sound contrived by
> picking Sunday as an example, but it's in fact very typical. The admin is
> not available:
> >
> > on weekends (the present example)
> > at any time outside of 9am to 5pm (keep in mind, this is a cluster used
> by students in different time zones)
> > any time he is on vacation
> > anytime the he is looking after his many other responsibilities.
> Constantly setting user limits that change on a daily basis would be too
> much too ask.
> >
> >
> > I'd be happy if you corrected my misunderstandings, especially if you
> could show me how to set a job limit that takes effect over multiple job
> arrays.
> >
> > I may have very glaring oversights as I don't necessarily have a big
> picture view of things (I've never been an admin, most notably), so feel
> free to poke holes at the way I've constructed things.
> >
> > Regards,
> > Guillaume.
> >
> >
> > On Fri, Aug 30, 2019 at 1:22 AM Steven Dick <kg4ydw at gmail.com> wrote:
> >>
> >> This makes no sense and seems backwards to me.
> >>
> >> When you submit an array job, you can specify how many jobs from the
> >> array you want to run at once.
> >> So, an administrator can create a QOS that explicitly limits the user.
> >> However, you keep saying that they probably won't modify the system
> >> for just you...
> >>
> >> That seems to me to be the perfect case to use array jobs and tell it
> >> how many elements of the array to run at once.
> >> You're not using array jobs for exactly the wrong reason.
> >>
> >> On Tue, Aug 27, 2019 at 1:19 PM Guillaume Perrault Archambault
> >> <gperr050 at uottawa.ca> wrote:
> >> > The reason I don't use job arrays is to be able limit the number of
> jobs per users
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190830/546cca6a/attachment-0001.htm>