[slurm-users] Can frequent hold-release adversely affect slurm?

Thu Oct 18 11:34:28 MDT 2018

On Thu, Oct 18, 2018 at 1:03 PM Daniel Letai <dani at letai.org.il> wrote:
>
>
> Hello all,
>
>
> To solve a requirement where a large number of job arrays (~10k arrays, each with at most 8M elements) with same priority should be executed with minimal starvation of any array - we don't want to wait for each array to complete before starting the next one - we wish to implement "interleaving" between arrays, we came up with the following scheme:
>
>
> Start all arrays in this partition in a "Hold" state.
>
> Release a predefined number of elements (E.g., 200)
>
> from this point a slurmctld prolog takes over:
>
> On the 200th job run squeue, note the next job array (array id following the currently executing array id)
>
> Release a predefined number of elements (E.g., 200)
>
> and repeat
>
>
> This might produce a very large number of release requests to the scheduler in a short time frame, and one concern is the scheduler loop getting too many requests.
>
> Can you think of other issues that might come up with this approach?
>
>
> Do you have any recommendations, or might suggest a better approach to solve this problem?

I can't comment on the scalability issues but if possible using %200
on the array submission seems like the simplest solution. From the
sbatch man page:
For  example "--array=0-15%4" will limit the number of simultaneously
running tasks from this job array to 4.

>
> We have considered fairshare, but all arrays are from same account and user. We have considered creating accounts on the fly (1 for each array) but get an error ("This should never happen") after creating a few thousand accounts.
>
> To my understanding fairshare is only viable between accounts.