[slurm-users] ticking time bomb? launching too many jobs in parallel
Paul Edmon
pedmon at cfa.harvard.edu
Mon Aug 26 14:13:05 UTC 2019
We've hit this before due to RPC saturation. I highly recommend using
max_rpc_cnt and/or defer for scheduling. That should help alleviate
this problem.
-Paul Edmon-
On 8/26/19 2:12 AM, Guillaume Perrault Archambault wrote:
> Hello,
>
> I wrote a regression-testing toolkit to manage large numbers of SLURM
> jobs and their output (the toolkit can be found here
> <https://github.com/gobbedy/slurm_simulation_toolkit/> if anyone is
> interested).
>
> To make job launching faster, sbatch commands are forked, so that
> numerous jobs may be submitted in parallel.
>
> We (the cluster admin and myself) are concerned that this may cause
> unresponsiveness for other users.
>
> I cannot say for sure since I don't have visibility over all users of
> the cluster, but unresponsiveness doesn't seem to have occurred so
> far. That being said, the fact that it hasn't occurred yet doesn't
> mean it won't in the future. So I'm treating this as a ticking time
> bomb to be fixed asap.
>
> My questions are the following:
> 1) Does anyone have experience with large numbers of jobs submitted in
> parallel? What are the limits that can be hit? For example is there
> some hard limit on how many jobs a SLURM scheduler can handle before
> blacking out / slowing down?
> 2) Is there a way for me to find/measure/ping this resource limit?
> 3) How can I make sure I don't hit this resource limit?
>
> From what I've observed, parallel submission can improve submission
> time by a factor at least 10x. This can make a big difference in
> users' workflows.
>
> For that reason I would like to keep the option of launching jobs
> sequentially as a last resort.
>
> Thanks in advance.
>
> Regards,
> Guillaume.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190826/ce7ead2e/attachment.htm>
More information about the slurm-users
mailing list