[slurm-users] Areas for improvement on our site's cluster scheduling

Tue May 8 01:51:33 MDT 2018

"Otherwise a user can have a sing le job that takes the entire cluster,
and insidesplit it up the way he wants to."
Yair, I agree. That is what I was referring to regardign interactive jobs.
Perhaps not a user reserving the entire cluster,
but a use reserving a lot of compute nodes and not making sure they are
utilised fully.

On 8 May 2018 at 09:37, Yair Yarom <irush at cs.huji.ac.il> wrote:

> Hi,
>
> This is what we did, not sure those are the best solutions :)
>
> ## Queue stuffing
>
> We have set PriorityWeightAge several magnitudes lower than
> PriorityWeightFairshare, and we also have PriorityMaxAge set to cap of
> older jobs. As I see it, the fairshare is far more important than age.
>
> Besides the MaxJobs that was suggested, we are considering setting up
> maximum allowed TRES resources, and not number of jobs. Otherwise a
> user can have a single job that takes the entire cluster, and inside
> split it up the way he wants to. As mentioned earlier, It will create
> an issue where jobs are pending and there are idle resources, but for
> that we have a special preempt-able "requeue" account/qos which users
> can use but the jobs there will be killed when "real" jobs arrive.
>
> ## Interactive job availability
>
> We have two partitions: short and long. They are indeed fixed where
> the short is on 100% of the cluster and the long is about 50%-80% of
> the cluster (depending on the cluster).
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180508/2b4eac91/attachment.html>