[slurm-users] [Long] Why are tasks started on a 30 second clock?
Kirill Katsnelson
kkm at pobox.com
Fri Jul 26 17:49:17 UTC 2019
On Thu, Jul 25, 2019 at 10:20 PM Benjamin Redling <
benjamin.rampe at uni-jena.de> wrote:
> If the 30s delay is only for jobs after the first full queue than it is
> backfill in action?
>
I'm certain this is not the backfill. I see the same behavior when I boot
the controller with all nodes in idle+power-save, and then submit an array.
>From the logs, each array job is assigned to a node immediately, the node
is told to power up, and all backfill debug messages since then say "no
jobs to backfill". All nodes are in alloc+powering-up state, all jobs of
the array are CF and have the same timestamp in squeue. But when the nodes
boot and come knocking to the controller, the symmetry is broken and the
jobs transition from CF to R in these curious bunches 30s apart.
> bf_interval=#
>
Incidentally, set to 5 in my configuration. But thanks for the idea, I'll
search for all "30"-s I can find in all the docs. :-)
-kkm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190726/2a15b293/attachment.htm>
More information about the slurm-users
mailing list