[slurm-users] [External] Re: Partition question
Prentice Bisbal
pbisbal at pppl.gov
Thu Dec 19 18:30:47 UTC 2019
On 12/19/19 10:44 AM, Ransom, Geoffrey M. wrote:
>
> The simplest is probably to just have a separate partition that will
> only allow job times of 1 hour or less.
>
> This is how our Univa queues used to work, by overlapping the same
> hardware. Univa shows available “slots” to the users and we had a lot
> of confused users complaining about all those free slots (busy slots
> in the other queue) while their jobs sat on the queue and new users
> confused as to why their jobs were being killed after 4 hours. I was
> able to move the short/long behavior to job classes and use RQSes and
> have one queue.
>
> While slurm isn’t showing users unused resources I am concerned that
> going back to two queues (partitions) will cause user interaction and
> adoption problems.
>
> It all depends on what best suits the specific needs.
>
> Is there a way to have one partition that holds aside a small
> percentage of resources for jobs with a runtime under 4 hours, i.e.
> jobs with long runtimes cannot tie up 100% of the resources at one
> time? Some kind of virtual partition that feeds into two other
> partitions based on runtime would also work. The goal is that users
> can continue to post jobs to one partition but the scheduler won’t let
> 100% of the compute resources get tied up with mutli-week long jobs.
>
The way to do this is with Quality of Service (QOS) in Slurm. When
creating a QOS, you can specify the max. number of tasks a QOS can use.
Create a QOS for the longer running jobs and set the MaxGrpTRES so that
the number of CPUs is less that 100% of your cluster. Create a QOS for
the shorter jobs with a shorter time limit (MaxWall).
Once the QOSes are setup, you can instruct your users to specify the
proper QOS when submitting a job, or edit the job_submit.lua script to
look at the time limit specified, and assign/override the QOS based on
that.
--
Prentice
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191219/139d9e29/attachment-0001.htm>
More information about the slurm-users
mailing list