[slurm-users] Backfill advice

david baker djbaker12 at gmail.com
Sat Mar 23 12:06:15 UTC 2019


We do have large jobs getting starved out on our cluster, and I note
particularly that we never manage to see a job getting assigned a start
time. It seems very possible that backfilled jobs are stealing nodes
reserved for large/higher priority jobs.

I'm wondering if our backfill configuration has any bearing on this issue
or whether we are unfortunate enough to have hit a bug. One parameter that
is missing in our bf setup is "bf_continue". Is that parameter significant
in terms of ensuring that bf drills down sufficiently in the job mix? Also
we are using the default bf frequency -- should we really reduce the
frequency and potentially reduce the number of bf jobs per group/user or
total at each iteration? Currently, I think we are setting the per/user
limit to 20.

Any thoughts would be appreciated, please.

Best regards,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190323/49c5bc02/attachment.html>

More information about the slurm-users mailing list