[slurm-users] Areas for improvement on our site's cluster scheduling

Yair Yarom irush at cs.huji.ac.il
Tue May 8 01:37:59 MDT 2018


This is what we did, not sure those are the best solutions :)

## Queue stuffing

We have set PriorityWeightAge several magnitudes lower than
PriorityWeightFairshare, and we also have PriorityMaxAge set to cap of
older jobs. As I see it, the fairshare is far more important than age.

Besides the MaxJobs that was suggested, we are considering setting up
maximum allowed TRES resources, and not number of jobs. Otherwise a
user can have a single job that takes the entire cluster, and inside
split it up the way he wants to. As mentioned earlier, It will create
an issue where jobs are pending and there are idle resources, but for
that we have a special preempt-able "requeue" account/qos which users
can use but the jobs there will be killed when "real" jobs arrive.

## Interactive job availability

We have two partitions: short and long. They are indeed fixed where
the short is on 100% of the cluster and the long is about 50%-80% of
the cluster (depending on the cluster).

More information about the slurm-users mailing list