[slurm-users] ticking time bomb? launching too many jobs in parallel
Brian Andrus
toomuchit at gmail.com
Tue Aug 27 16:52:09 UTC 2019
Here is where you may want to look into slurmdbd and sacct
Then you can create a qos that has MaxJobsPerUser to limit the total
number running on a per-user basis:
https://slurm.schedmd.com/resource_limits.html
Brian Andrus
On 8/27/2019 9:38 AM, Guillaume Perrault Archambault wrote:
> Hi Paul,
>
> Your comment confirms my worst fear, that I should either implement
> job arrays or stick to a sequential for loop.
>
> My problem with job arrays is that, as far as I understand them, they
> cannot be used with singleton to set a max job limit.
>
> I use singleton to limit the number of jobs a user can be running at a
> time. For example if the limit is 3 jobs per user and the user
> launches 10 jobs, the sbatch submissions via my scripts may look this:
> sbatch --job-name=job1 [OPTIONS SET1] Dependency=singleton my.sbatch
> sbatch --job-name=job2 [OTHER SET1] Dependency=singleton my.sbatch
> sbatch --job-name=job3 [OTHER SET1] Dependency=singleton my.sbatch
> sbatch --job-name=job1 [OTHER SET1 Dependency=singleton my.sbatch
> sbatch --job-name=job2 [OTHER SET1 ] Dependency=singleton my.sbatch
> sbatch --job-name=job3 [OTHER SET2] Dependency=singleton my.sbatch2
> sbatch --job-name=job1 [OTHER SET2] Dependency=singleton my.sbatch2
> sbatch --job-name=job2 [OTHER SET2 ] Dependency=singleton my.sbatch2
> sbatch --job-name=job2 [OTHER SET2 ] Dependency=singleton my.sbatch2
> sbatch --job-name=job1 [OTHER SET2 ] Dependency=singleton my.sbatch 2
>
> This way, at most 3 jobs will run at a time (ie a job with name job1,
> a job with name job2, and job with name job3).
>
> Notice that my example has two option sets provided to sbatch, so the
> example would be suitable for conversion to two Job Arrays.
>
> This is the problem I can't obercome.
>
> In the job array documentation, I see
> A maximum number of simultaneously running tasks from the job array
> may be specified using a "%" separator. For example "--array=0-15%4"
> will limit the number of simultaneously running tasks from this job
> array to 4.
>
> But this '%' separator cannot specify a max number of tasks over two
> (or more) separate job arrays, as far as I can tell.
>
> And the job array element names cannot be made to modulo rotate in the
> way they do in my above example.
>
> Perhaps I need to play more with job arrays, and try harder to find a
> solution to limit number of jobs across multiple arrays. Or ask this
> question in a separate post, since it's a bit off topic.
>
> In any case, thanks so much for answer my question. I think it answer
> my original post perfectly :)
>
> Regards,
> Guillaume.
>
> On Tue, Aug 27, 2019 at 10:08 AM Paul Edmon <pedmon at cfa.harvard.edu
> <mailto:pedmon at cfa.harvard.edu>> wrote:
>
> At least for our cluster we generally recommend that if you are
> submitting large numbers of jobs you either use a job array or you
> just for loop over the jobs you want to submit. A fork bomb is
> definitely not recommended. For highest throughput submission a
> job array is your best bet as in one submission it will generate
> thousands of jobs which then the scheduler can handle sensibly.
> So I highly recommend using job arrays.
>
> -Paul Edmon-
>
> On 8/27/19 3:45 AM, Guillaume Perrault Archambault wrote:
>> Hi Paul,
>>
>> Thanks a lot for your suggestion.
>>
>> The cluster I'm using has thousands of users, so I'm doubtful the
>> admins will change this setting just for me. But I'll mention it
>> to the support team I'm working with.
>>
>> I was hoping more for something that can be done on the user end.
>>
>> Is there some way for the user to measure whether the scheduler
>> is in RPC saturation? And then if it is, I could make sure my
>> script doesn't launch too many jobs in parallel.
>>
>> Sorry if my question is too vague, I don't understand the backend
>> of the SLURM scheduler too well, so my questions are using the
>> limited terminology of a user.
>>
>> My concern is just to make sure that my scripts don't send out
>> more commands (simultaneously) than the scheduler can handle.
>>
>> For example, as an extreme scenario, suppose a user forks off
>> 1000 sbatch commands in parallel, is that more than the scheduler
>> can handle? As a user, how can I know whether it is?
>>
>> Regards,
>> Guillaume.
>>
>>
>>
>> On Mon, Aug 26, 2019 at 10:15 AM Paul Edmon
>> <pedmon at cfa.harvard.edu <mailto:pedmon at cfa.harvard.edu>> wrote:
>>
>> We've hit this before due to RPC saturation. I highly
>> recommend using max_rpc_cnt and/or defer for scheduling.
>> That should help alleviate this problem.
>>
>> -Paul Edmon-
>>
>> On 8/26/19 2:12 AM, Guillaume Perrault Archambault wrote:
>>> Hello,
>>>
>>> I wrote a regression-testing toolkit to manage large numbers
>>> of SLURM jobs and their output (the toolkit can be found
>>> here <https://github.com/gobbedy/slurm_simulation_toolkit/>
>>> if anyone is interested).
>>>
>>> To make job launching faster, sbatch commands are forked, so
>>> that numerous jobs may be submitted in parallel.
>>>
>>> We (the cluster admin and myself) are concerned that this
>>> may cause unresponsiveness for other users.
>>>
>>> I cannot say for sure since I don't have visibility over all
>>> users of the cluster, but unresponsiveness doesn't seem to
>>> have occurred so far. That being said, the fact that it
>>> hasn't occurred yet doesn't mean it won't in the future. So
>>> I'm treating this as a ticking time bomb to be fixed asap.
>>>
>>> My questions are the following:
>>> 1) Does anyone have experience with large numbers of jobs
>>> submitted in parallel? What are the limits that can be hit?
>>> For example is there some hard limit on how many jobs a
>>> SLURM scheduler can handle before blacking out / slowing down?
>>> 2) Is there a way for me to find/measure/ping this resource
>>> limit?
>>> 3) How can I make sure I don't hit this resource limit?
>>>
>>> From what I've observed, parallel submission can improve
>>> submission time by a factor at least 10x. This can make a
>>> big difference in users' workflows.
>>>
>>> For that reason I would like to keep the option of launching
>>> jobs sequentially as a last resort.
>>>
>>> Thanks in advance.
>>>
>>> Regards,
>>> Guillaume.
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190827/39f3813c/attachment-0001.htm>
More information about the slurm-users
mailing list