Dear Xaver, Xaver Stiensmeier via slurm-users <slurm-users@lists.schedmd.com> writes:
Dearl Slurm User List,
I am currently reviewing a slurm.conf where the developer set Weight manually to attribute a greater weight to machines that have more RAM to force smaller jobs on smaller instances. However, I feel like there is something already in place or better than manually setting the weights, but I couldn't find it.
If I understand correctly Slurm does not schedule jobs to the smallest possible node on default. So small jobs can be scheduled to large instances and a big job might have to wait indefinitely when using backfilling.
Backfilling, if configured correctly, should not cause any jobs to wait indefinitely. It should only allow jobs to make uses of gaps in the scheduling table which would otherwise lead to resources being remaining unused. However, from 'man slurm.conf', you need to ensure that for bf_window the following holds: A value at least as long as the highest allowed time limit is generally advisable to prevent job starvation. Cheers, Loris
I thought that Slurm does have mechanisms to prevent this but was unable to find it again in the documentation.
Is there really no automatism at place or am I overlooking something?
Best, Xaver -- Dr. Loris Bennett (Herr/Mr) FUB-IT, Freie Universität Berlin