[slurm-users] ticking time bomb? launching too many jobs in parallel

Paul Edmon pedmon at cfa.harvard.edu
Fri Aug 30 19:14:37 UTC 2019


Yes, QoS's are dynamic.

-Paul Edmon-

On 8/30/19 2:58 PM, Guillaume Perrault Archambault wrote:
> Hi Paul,
>
> Thanks for your pointers.
>
> I'll looking into QOS and MCS after my paper deadline (Sept 5). Re 
> QOS, as expressed to Peter in the reply I just now sent, I wonder if 
> it the QOS of a job can be change while it's pending (submitted but 
> not yet running).
>
> Regards,
> Guillaume.
>
> On Fri, Aug 30, 2019 at 10:24 AM Paul Edmon <pedmon at cfa.harvard.edu 
> <mailto:pedmon at cfa.harvard.edu>> wrote:
>
>     A QoS is probably your best bet.  Another variant might be MCS, which
>     you can use to help reduce resource fragmentation.  For limits though
>     QoS will be your best bet.
>
>     -Paul Edmon-
>
>     On 8/30/19 7:33 AM, Steven Dick wrote:
>     > It would still be possible to use job arrays in this situation, it's
>     > just slightly messy.
>     > So the way a job array works is that you submit a single script, and
>     > that script is provided an integer for each subjob.  The integer
>     is in
>     > a range, with a possible step (default=1).
>     >
>     > To run the situation you describe, you would have to
>     predetermine how
>     > many of each test you want to run (i.e., you coudln't dynamically
>     > change the number of jobs that run within one array)., and a master
>     > script would map the integer range to the job that was to be
>     started.
>     >
>     > The most trivial way to do it would be to put the list of
>     regressions
>     > in a text file and the master script would index it by line
>     number and
>     > then run the appropriate command.
>     > A more complex way would be to do some math (a divide?) to get the
>     > script name and subindex (modulus?) for each regression.
>     >
>     > Both of these would require some semi-advanced scripting, but
>     nothing
>     > that couldn't be cut and pasted with some trivial modifications for
>     > each job set.
>     >
>     > As to the unavailability of the admin ...
>     > An alternate approach that would require the admin's help would
>     be to
>     > come up with a small set of alocations (e.g., 40 gpus, 80 gpus, 100
>     > gpus, etc.) and make a QOS for each one with a gpu limit (e.g.,
>     > maxtrespu=gpu=40 ) Then the user would assign that QOS to the
>     job when
>     > starting it to set the overall allocation for all the jobs.  The
>     admin
>     > woudln't need to tweak this except once, you just pick which
>     tweak to
>     > use.
>     >
>     > On Fri, Aug 30, 2019 at 2:36 AM Guillaume Perrault Archambault
>     > <gperr050 at uottawa.ca <mailto:gperr050 at uottawa.ca>> wrote:
>     >> Hi Steven,
>     >>
>     >> Thanks for taking the time to reply to my post.
>     >>
>     >> Setting a limit on the number of jobs for a single array isn't
>     sufficient because regression-tests need to launch multiple
>     arrays, and I would need a job limit that would take effect over
>     all launched jobs.
>     >>
>     >> It's very possible I'm not understand something. I'll lay out a
>     very specific example in the hopes you can correct me if I've gone
>     wrong somewhere.
>     >>
>     >> Let's take the small cluster with 140 GPUs and no fairshare as
>     an example, because it's easier for me to explain.
>     >>
>     >> The users, who all know each other personally and interact via
>     chat, decide on a daily basis how many jobs each user can run at a
>     time.
>     >>
>     >> Let's say today is Sunday (hypothetically). Nobody is actively
>     developing today, except that user 1 has 10 jobs running for the
>     entire weekend. That leaves 130 GPUs unused.
>     >>
>     >> User 2, whose jobs all run on 1 GPU decides to run a regression
>     test. The regression test comprises of 9 different scripts each
>     run 40 times, for a grand total of 360 jobs. The duration of the
>     scripts vary from 1 and 5 hours to complete, and the jobs take on
>     average 4 hours to complete.
>     >>
>     >> User 2 gets the user group's approval (via chat) to use 90 GPUs
>     (so that 40 GPUs will remain for anyone else wanting to work that
>     day).
>     >>
>     >> The problem I'm trying to solve is this: how do I ensure that
>     user 2 launches his 360 jobs in such a way that 90 jobs are in the
>     run state consistently until the regression test is finished?
>     >>
>     >> Keep in mind that:
>     >>
>     >> limiting each job array to 10 jobs is inefficient: when the
>     first job array finishes (long before the last one), only 80 GPUs
>     will be used, and so on as other arrays finish
>     >> the admin is not available, he cannot be asked to set a hard
>     limit of 90 jobs for user 2 just for today
>     >>
>     >> I would be happy to use job arrays if they allow me to set an
>     overarching job limit across multiple arrays. Perhaps this is
>     doable. Admttedly I'm working on a paper to be submitted in a few
>     days, so I don't have time to test jobs arrays thoroughly, but I
>     will try out job arrays more thoroughly once I've submitted my
>     paper (ie after sept 5).
>     >>
>     >> My solution, for now, is to not use job arrays. Instead, I
>     launch each job individually, and I use singleton (by launching
>     all jobs with the same 90 unique names) to ensure that exactly 90
>     jobs are run at a time (in this case, corresponding to 90 GPUs in
>     use).
>     >>
>     >> Side note: the unavailability of the admin might sound
>     contrived by picking Sunday as an example, but it's in fact very
>     typical. The admin is not available:
>     >>
>     >> on weekends (the present example)
>     >> at any time outside of 9am to 5pm (keep in mind, this is a
>     cluster used by students in different time zones)
>     >> any time he is on vacation
>     >> anytime the he is looking after his many other
>     responsibilities. Constantly setting user limits that change on a
>     daily basis would be too much too ask.
>     >>
>     >>
>     >> I'd be happy if you corrected my misunderstandings, especially
>     if you could show me how to set a job limit that takes effect over
>     multiple job arrays.
>     >>
>     >> I may have very glaring oversights as I don't necessarily have
>     a big picture view of things (I've never been an admin, most
>     notably), so feel free to poke holes at the way I've constructed
>     things.
>     >>
>     >> Regards,
>     >> Guillaume.
>     >>
>     >>
>     >> On Fri, Aug 30, 2019 at 1:22 AM Steven Dick <kg4ydw at gmail.com
>     <mailto:kg4ydw at gmail.com>> wrote:
>     >>> This makes no sense and seems backwards to me.
>     >>>
>     >>> When you submit an array job, you can specify how many jobs
>     from the
>     >>> array you want to run at once.
>     >>> So, an administrator can create a QOS that explicitly limits
>     the user.
>     >>> However, you keep saying that they probably won't modify the
>     system
>     >>> for just you...
>     >>>
>     >>> That seems to me to be the perfect case to use array jobs and
>     tell it
>     >>> how many elements of the array to run at once.
>     >>> You're not using array jobs for exactly the wrong reason.
>     >>>
>     >>> On Tue, Aug 27, 2019 at 1:19 PM Guillaume Perrault Archambault
>     >>> <gperr050 at uottawa.ca <mailto:gperr050 at uottawa.ca>> wrote:
>     >>>> The reason I don't use job arrays is to be able limit the
>     number of jobs per users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190830/242c5e61/attachment-0001.htm>


More information about the slurm-users mailing list