[slurm-users] ticking time bomb? launching too many jobs in parallel

Brian Andrus toomuchit at gmail.com
Tue Aug 27 16:47:20 UTC 2019

Just a couple comments from experience in general:

1) If you can, either use xargs or parallel to do the forking so you can 
limit the number of simultaneous submissions

2) I have yet to see where it is a good idea to have many separate jobs 
when using an array can work.

     If you can prep up a proper input file for a script, a single 
submission is all it takes. Then you can control how many are currently 
running (MaxArrayTask) and can change that to scale up/down.

Brian Andrus

On 8/25/2019 11:12 PM, Guillaume Perrault Archambault wrote:
> Hello,
> I wrote a regression-testing toolkit to manage large numbers of SLURM 
> jobs and their output (the toolkit can be found here 
> <https://github.com/gobbedy/slurm_simulation_toolkit/> if anyone is 
> interested).
> To make job launching faster, sbatch commands are forked, so that 
> numerous jobs may be submitted in parallel.
> We (the cluster admin and myself) are concerned that this may cause 
> unresponsiveness for other users.
> I cannot say for sure since I don't have visibility over all users of 
> the cluster, but unresponsiveness doesn't seem to have occurred so 
> far. That being said, the fact that it hasn't occurred yet doesn't 
> mean it won't in the future. So I'm treating this as a ticking time 
> bomb to be fixed asap.
> My questions are the following:
> 1) Does anyone have experience with large numbers of jobs submitted in 
> parallel? What are the limits that can be hit? For example is there 
> some hard limit on how many jobs a SLURM scheduler can handle before 
> blacking out / slowing down?
> 2) Is there a way for me to find/measure/ping this resource limit?
> 3) How can I make sure I don't hit this resource limit?
> From what I've observed, parallel submission can improve submission 
> time by a factor at least 10x. This can make a big difference in 
> users' workflows.
> For that reason I would like to keep the option of launching jobs 
> sequentially as a last resort.
> Thanks in advance.
> Regards,
> Guillaume.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190827/10328003/attachment.htm>

More information about the slurm-users mailing list