[slurm-users] ticking time bomb? launching too many jobs in parallel

Tue Aug 27 16:52:09 UTC 2019

Here is where you may want to look into slurmdbd and sacct

Then you can create a qos that has MaxJobsPerUser to limit the total 
number running on a per-user basis: 
https://slurm.schedmd.com/resource_limits.html

Brian Andrus

On 8/27/2019 9:38 AM, Guillaume Perrault Archambault wrote:
> Hi Paul,
>
> Your comment confirms my worst fear, that I should either implement 
> job arrays or stick to a sequential for loop.
>
> My problem with job arrays is that, as far as I understand them, they 
> cannot be used with singleton to set a max job limit.
>
> I use singleton to limit the number of jobs a user can be running at a 
> time. For example if the limit is 3 jobs per user and the user 
> launches 10 jobs, the sbatch submissions via my scripts may look this:
> sbatch --job-name=job1 [OPTIONS SET1] Dependency=singleton my.sbatch
> sbatch --job-name=job2 [OTHER  SET1] Dependency=singleton my.sbatch
> sbatch --job-name=job3 [OTHER SET1] Dependency=singleton my.sbatch
> sbatch --job-name=job1 [OTHER SET1 Dependency=singleton my.sbatch
> sbatch --job-name=job2 [OTHER SET1 ] Dependency=singleton my.sbatch
> sbatch --job-name=job3 [OTHER SET2] Dependency=singleton my.sbatch2
> sbatch --job-name=job1 [OTHER SET2] Dependency=singleton my.sbatch2
> sbatch --job-name=job2 [OTHER SET2 ] Dependency=singleton my.sbatch2
> sbatch --job-name=job2 [OTHER SET2 ] Dependency=singleton my.sbatch2
> sbatch --job-name=job1 [OTHER SET2 ] Dependency=singleton my.sbatch 2
>
> This way, at most 3 jobs will run at a time (ie a job with name job1, 
> a job with name job2, and job with name job3).
>
> Notice that my example has two option sets provided to sbatch, so the 
> example would be suitable for conversion to two Job Arrays.
>
> This is the problem I can't obercome.
>
> In the job array documentation, I see
> A maximum number of simultaneously running tasks from the job array 
> may be specified using a "%" separator. For example "--array=0-15%4" 
> will limit the number of simultaneously running tasks from this job 
> array to 4.
>
> But this '%' separator cannot specify a max number of tasks over two 
> (or more) separate job arrays, as far as I can tell.
>
> And the job array element names cannot be made to modulo rotate in the 
> way they do in my above example.
>
> Perhaps I need to play more with job arrays, and try harder to find a 
> solution to limit number of jobs across multiple arrays. Or ask this 
> question in a separate post, since it's a bit off topic.
>
> In any case, thanks so much for answer my question. I think it answer 
> my original post perfectly :)
>
> Regards,
> Guillaume.
>
> On Tue, Aug 27, 2019 at 10:08 AM Paul Edmon <pedmon at cfa.harvard.edu 
> <mailto:pedmon at cfa.harvard.edu>> wrote:
>
>     At least for our cluster we generally recommend that if you are
>     submitting large numbers of jobs you either use a job array or you
>     just for loop over the jobs you want to submit.  A fork bomb is
>     definitely not recommended.  For highest throughput submission a
>     job array is your best bet as in one submission it will generate
>     thousands of jobs which then the scheduler can handle sensibly. 
>     So I highly recommend using job arrays.
>
>     -Paul Edmon-
>
>     On 8/27/19 3:45 AM, Guillaume Perrault Archambault wrote:
>>     Hi Paul,
>>
>>     Thanks a lot for your suggestion.
>>
>>     The cluster I'm using has thousands of users, so I'm doubtful the
>>     admins will change this setting just for me. But I'll mention it
>>     to the support team I'm working with.
>>
>>     I was hoping more for something that can be done on the user end.
>>
>>     Is there some way for the user to measure whether the scheduler
>>     is in RPC saturation? And then if it is, I could make sure my
>>     script doesn't launch too many jobs in parallel.
>>
>>     Sorry if my question is too vague, I don't understand the backend
>>     of the SLURM scheduler too well, so my questions are using the
>>     limited terminology of a user.
>>
>>     My concern is just to make sure that my scripts don't send out
>>     more commands (simultaneously) than the scheduler can handle.
>>
>>     For example, as an extreme scenario, suppose a user forks off
>>     1000 sbatch commands in parallel, is that more than the scheduler
>>     can handle? As a user, how can I know whether it is?
>>
>>     Regards,
>>     Guillaume.
>>
>>
>>
>>     On Mon, Aug 26, 2019 at 10:15 AM Paul Edmon
>>     <pedmon at cfa.harvard.edu <mailto:pedmon at cfa.harvard.edu>> wrote:
>>
>>         We've hit this before due to RPC saturation.  I highly
>>         recommend using max_rpc_cnt and/or defer for scheduling. 
>>         That should help alleviate this problem.
>>
>>         -Paul Edmon-
>>
>>         On 8/26/19 2:12 AM, Guillaume Perrault Archambault wrote:
>>>         Hello,
>>>
>>>         I wrote a regression-testing toolkit to manage large numbers
>>>         of SLURM jobs and their output (the toolkit can be found
>>>         here <https://github.com/gobbedy/slurm_simulation_toolkit/>
>>>         if anyone is interested).
>>>
>>>         To make job launching faster, sbatch commands are forked, so
>>>         that numerous jobs may be submitted in parallel.
>>>
>>>         We (the cluster admin and myself) are concerned that this
>>>         may cause unresponsiveness for other users.
>>>
>>>         I cannot say for sure since I don't have visibility over all
>>>         users of the cluster, but unresponsiveness doesn't seem to
>>>         have occurred so far. That being said, the fact that it
>>>         hasn't occurred yet doesn't mean it won't in the future. So
>>>         I'm treating this as a ticking time bomb to be fixed asap.
>>>
>>>         My questions are the following:
>>>         1) Does anyone have experience with large numbers of jobs
>>>         submitted in parallel? What are the limits that can be hit?
>>>         For example is there some hard limit on how many jobs a
>>>         SLURM scheduler can handle before blacking out / slowing down?
>>>         2) Is there a way for me to find/measure/ping this resource
>>>         limit?
>>>         3) How can I make sure I don't hit this resource limit?
>>>
>>>         From what I've observed, parallel submission can improve
>>>         submission time by a factor at least 10x. This can make a
>>>         big difference in users' workflows.
>>>
>>>         For that reason I would like to keep the option of launching
>>>         jobs sequentially as a last resort.
>>>
>>>         Thanks in advance.
>>>
>>>         Regards,
>>>         Guillaume.
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190827/39f3813c/attachment-0001.htm>