<div><div dir="auto">Hello,</div></div><div dir="auto"><br></div><div dir="auto">At first blush bf_continue and bf_interval as well as bf_maxjobs (if I remembered the parameter correctly) are critical first steps in tuning.  Setting DebugFlags=backfill is essential to getting the needed data to make tuning decisions.</div><div dir="auto"><br></div><div dir="auto">Use of per user/account settings if they are too low can also cause starvation depending on the way your priority calculation is set up.</div><div dir="auto"><br></div><div dir="auto">I presented these slides a few years ago ag the slurm user group on this topic: <div><a href="https://slurm.schedmd.com/SLUG16/NERSC.pdf">https://slurm.schedmd.com/SLUG16/NERSC.pdf</a></div><div dir="auto"><br></div><div dir="auto">The key thing to keep in mind with large jobs is that slurm needs to evaluate them again and again in the same order or the scheduled time may drift.  Thus it is important that once jobs are getting planning reservations they must continue to do so.</div><div dir="auto"><br></div><div dir="auto">Because of the prevalence of large jobs at our site we use  bf_min_prio_resv which splits the priority space into a reserving and non-reserving set, and then use job age to allow jobs to age from the non reserving portion of the priority space to the reservation portion.  Use of the recent MaxJobsAccruePerUser limits on a job qos can throttle the rate of jobs aging and prevent negative effects from users submitting large numbers of jobs.</div><div dir="auto"><br></div><div dir="auto">I realize that is a large number of tunables and concepts densely packed, but it should give you some reasonable starting points.</div><div dir="auto"><br></div><div dir="auto">Doug</div><div dir="auto"><br></div><div dir="auto"><br></div></div><div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Mar 23, 2019 at 05:26 david baker <<a href="mailto:djbaker12@gmail.com">djbaker12@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hello,</div><div><br></div><div>We do have large jobs getting starved out on our cluster, and I note particularly that we never manage to see a job getting assigned a start time. It seems very possible that backfilled jobs are stealing nodes reserved for large/higher priority jobs.<br></div><div><br></div><div>I'm wondering if our backfill configuration has any bearing on this issue or whether we are unfortunate enough to have hit a bug. One parameter that is missing in our bf setup is "bf_continue". Is that parameter significant in terms of ensuring that bf drills down sufficiently in the job mix? Also we are using the default bf frequency -- should we really reduce the frequency and potentially reduce the number of bf jobs per group/user or total at each iteration? Currently, I think we are setting the per/user limit to 20.</div><div><br></div><div>Any thoughts would be appreciated, please.</div><div><br></div><div>Best regards,</div><div>David</div></div>

</blockquote></div></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Sent from Gmail Mobile</div>