<html style="direction: ltr;">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<style type="text/css">body p { margin-bottom: 0cm; margin-top: 0pt; } </style>
<style type="text/css">body p { margin-bottom: 0cm; margin-top: 0pt; } </style>
</head>
<body bidimailui-charset-is-forced="true" style="direction: ltr;"
text="#000000" bgcolor="#FFFFFF">
<br>
Hello all,<br>
<br>
<p><br>
</p>
<p>To solve a requirement where a large number of job arrays (~10k
arrays, each with at most 8M elements) with same priority should
be executed with minimal starvation of any array - we don't want
to wait for each array to complete before starting the next one -
we wish to implement "interleaving" between arrays, we came up
with the following scheme:</p>
<p><br>
</p>
<p>Start all arrays in this partition in a "Hold" state.</p>
<p>Release a predefined number of elements (E.g., 200)</p>
<p>from this point a slurmctld prolog takes over:<br>
</p>
<p>On the 200th job run squeue, note the next job array (array id
following the currently executing array id)</p>
<p>Release a predefined number of elements (E.g., 200)</p>
<p>and repeat<br>
</p>
<p><br>
</p>
<p>This might produce a very large number of release requests to the
scheduler in a short time frame, and one concern is the scheduler
loop getting too many requests.</p>
<p>Can you think of other issues that might come up with this
approach?<br>
</p>
<p><br>
</p>
<p>Do you have any recommendations, or might suggest a better
approach to solve this problem?</p>
<p><br>
</p>
<p>We have considered fairshare, but all arrays are from same
account and user. We have considered creating accounts on the fly
(1 for each array) but get an error ("This should never happen")
after creating a few thousand accounts.</p>
<p>To my understanding fairshare is only viable between accounts.<br>
</p>
</body>
</html>