[slurm-users] [External] Re: Partition question

Prentice Bisbal pbisbal at pppl.gov
Thu Dec 19 18:30:47 UTC 2019


On 12/19/19 10:44 AM, Ransom, Geoffrey M. wrote:
>
> The simplest is probably to just have a separate partition that will 
> only allow job times of 1 hour or less.
>
> This is how our Univa queues used to work, by overlapping the same 
> hardware. Univa shows available “slots” to the users and we had a lot 
> of confused users complaining about all those free slots (busy slots 
> in the other queue) while their jobs sat on the queue and new users 
> confused as to why their jobs were being killed after 4 hours. I was 
> able to move the short/long behavior to job classes and use RQSes and 
> have one queue.
>
> While slurm isn’t showing users unused resources I am concerned that 
> going back to two queues (partitions) will cause user interaction and 
> adoption problems.
>
>          It all depends on what best suits the specific needs.
>
> Is there a way to have one partition that holds aside a small 
> percentage of resources for jobs with a runtime under 4 hours, i.e. 
> jobs with long runtimes cannot tie up 100% of the resources at one 
> time? Some kind of virtual partition that feeds into two other 
> partitions based on runtime would also work. The goal is that users 
> can continue to post jobs to one partition but the scheduler won’t let 
> 100% of the compute resources get tied up with mutli-week long jobs.
>
The way to do this is with Quality of Service (QOS) in Slurm. When 
creating a QOS, you can specify the max. number of tasks a QOS can use. 
Create a QOS for the longer running jobs and set the MaxGrpTRES so that 
the number of CPUs is less that 100% of your cluster. Create a QOS for 
the shorter jobs with a shorter time limit (MaxWall).

Once the QOSes are setup, you can instruct your users to specify the 
proper QOS when submitting a job, or edit the job_submit.lua script to 
look at the time limit specified, and assign/override the QOS based on 
that.

--
Prentice

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191219/139d9e29/attachment-0001.htm>


More information about the slurm-users mailing list