[slurm-users] Areas for improvement on our site's cluster scheduling

Paul Edmon pedmon at cfa.harvard.edu
Tue May 8 08:27:17 MDT 2018


We've been using a backfill priority partition for people doing HTC 
work.  We have requeue set so that jobs from the high priority 
partitions can take over.

You can do this for your interactive nodes as well if you want. We 
dedicate hardware to interactive work and use Partition based QoS's to 
limit usage.

-Paul Edmon-


On 05/08/2018 10:08 AM, Renfro, Michael wrote:
> That’s the first limit I placed on our cluster, and it has generally worked out well (never used a job limit). A single account can get 1000 CPU-days in whatever distribution they want. I’ve just added a root-only ‘expedited’ QOS for times when the cluster is mostly idle, but a few users have jobs that run past the TRES limit. But I really like the idea of a preemptable QOS that the users can put their extra jobs into on their own.
>




More information about the slurm-users mailing list