[slurm-users] maximum size of array jobs
Marcus Wagner
wagner at itc.rwth-aachen.de
Tue Feb 26 15:05:22 UTC 2019
Hi Jeffrey,
thanks for the hint regarding scontrol reconfig. That one drove me nuts
again.
I changed it to MaxArraySize=100000. I restartet slurmctld, since i also
changed some features of the nodes.
I soon realized, that I only could submit --array=1-99999, I then
already myself increased MaxArraySize to 100001 and did an scontrol
reconfig.
Behaviour was still the same. Now, I know why :)
Best,
Marcus
On 2/26/19 3:27 PM, Jeffrey Frey wrote:
> Also see "https://slurm.schedmd.com/slurm.conf.html" for
> MaxArraySize/MaxJobCount.
>
> We just went through a user-requested adjustment to MaxArraySize to
> bump it from 1000 to 10000; as the documentation states, since each
> index of an array job is essentially "a job," you must be sure to also
> adjust MaxJobCount (from 10000 to 100000 in our case).
> Adjusting MaxJobCount requires a restart of slurmctld; though the
> documentation doesn't state it, so does adjustment of MaxArraySize
> (scontrol reconfigure will succeed but leave the previous limit in
> effect, see "https://bugs.schedmd.com/show_bug.cgi?id=6553").
>
> The "MaxArraySize" is a bit of a misnomer since it's really 1 + the
> top of the valid range of indices -- "MaxArrayIndex" would be more
> apt. Our users were very happy with Grid Engine's allowance of any
> index range and striding that produces no more than "max_aj_tasks"
> indices; since moving to Slurm they're forced to come up with their
> own index-mapping functionality at times, but the relatively low
> MaxArraySize versus what we had in GridEngine (75000) has been
> especially frustrating for them.
>
> So far the 10000/100000 combo hasn't come close to exhausting
> resources on our slurmctld nodes; but we haven't actually submitted a
> couple 10000-index array jobs and enough other jobs to hit 100000
> active jobs, so current memory usage isn't an adequate measure of
> usage under load. Since the slurm.conf documentation states:
>
>
> Performance can suffer with more than a few hundred thousand jobs.
>
>
>
> we're reluctant to increase MaxJobCount too much higher.
>
>
>
>
>> On Feb 26, 2019, at 3:18 AM, Ole Holm Nielsen
>> <Ole.H.Nielsen at fysik.dtu.dk <mailto:Ole.H.Nielsen at fysik.dtu.dk>> wrote:
>>
>> On 2/26/19 9:07 AM, Marcus Wagner wrote:
>>> Does anyone know, why per default the number of array elements is
>>> limited to 1000?
>>> We have one user, who would like to have 100k array elements!
>>> What is more difficult for the scheduler, one array job with 100k
>>> elements or 100k non-array jobs?
>>> Where did you set the limit? Do your users use array jobs at all?
>>
>> Google is your friend :-)
>>
>> https://slurm.schedmd.com/job_array.html
>>
>>> A new configuration parameter has been added to control the maximum
>>> job array size: MaxArraySize. The smallest index that can be
>>> specified by a user is zero and the maximum index is MaxArraySize
>>> minus one. The default value of MaxArraySize is 1001. The maximum
>>> MaxArraySize supported in Slurm is 4000001. Be mindful about the
>>> value of MaxArraySize as job arrays offer an easy way for users to
>>> submit large numbers of jobs very quickly.
>>
>> /Ole
>>
>
>
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::
> Jeffrey T. Frey, Ph.D.
> Systems Programmer V / HPC Management
> Network & Systems Services / College of Engineering
> University of Delaware, Newark DE 19716
> Office: (302) 831-6034 Mobile: (302) 419-4976
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::
>
>
>
>
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner at itc.rwth-aachen.de
www.itc.rwth-aachen.de
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190226/5acf4c5d/attachment-0001.html>
More information about the slurm-users
mailing list