[slurm-users] Priority access for a group of users
Prentice Bisbal
pbisbal at pppl.gov
Tue Feb 19 14:12:45 UTC 2019
I just set this up a couple of weeks ago myself. Creating two partitions
is definitely the way to go. I created one partition, "general" for
normal, general-access jobs, and another, "interruptible" for
general-access jobs that can be interrupted, and then set PriorityTier
accordingly in my slurm.conf file (Node names omitted for clarity/brevity).
PartitionName=general Nodes=... MaxTime=48:00:00 State=Up
PriorityTier=10 QOS=general
PartitionName=interruptible Nodes=... MaxTime=48:00:00 State=Up
PriorityTier=1 QOS=interruptible
I then set PreemptMode=Requeue, because I'd rather have jobs requeued
than suspended. And it's been working great. There are few other
settings I had to change. The best documentation for all the settings
you need to change is https://slurm.schedmd.com/preempt.html
Everything has been working exactly as desired and advertised. My users
who needed the ability to run low-priority, long-running jobs are very
happy.
The one caveat is that jobs that will be killed and requeued need to
support checkpoint/restart. So when this becomes a production thing,
users are going to have to acknowledge that they will only use this
partition for jobs that have some sort of checkpoint/restart capability.
Prentice
On 2/15/19 11:56 AM, david baker wrote:
> Hi Paul, Marcus,
>
> Thank you for your replies. Using partition priority all makes sense.
> I was thinking of doing something similar with a set of nodes
> purchased by another group. That is, having a private high priority
> partition and a lower priority "scavenger" partition for the public.
> In this case scavenger jobs will get killed when preempted.
>
> In the present case , I did wonder if it would be possible to do
> something with just a single partition -- hence my question.Your
> replies have convinced me that two partitions will work -- with
> preemption leading to re-queued jobs.
>
> Best regards,
> David
>
> On Fri, Feb 15, 2019 at 3:09 PM Paul Edmon <pedmon at cfa.harvard.edu
> <mailto:pedmon at cfa.harvard.edu>> wrote:
>
> Yup, PriorityTier is what we use to do exactly that here. That
> said unless you turn on preemption jobs may still pend if there is
> no space. We run with REQUEUE on which has worked well.
>
>
> -Paul Edmon-
>
>
> On 2/15/19 7:19 AM, Marcus Wagner wrote:
>> Hi David,
>>
>> as far as I know, you can use the PriorityTier (partition
>> parameter) to achieve this. According to the manpages (if I
>> remember right) jobs from higher priority tier partitions have
>> precedence over jobs from lower priority tier partitions, without
>> taking the normal fairshare priority into consideration.
>>
>> Best
>> Marcus
>>
>> On 2/15/19 10:07 AM, David Baker wrote:
>>>
>>> Hello.
>>>
>>>
>>> We have a small set of compute nodes owned by a group. The group
>>> has agreed that the rest of the HPC community can use these
>>> nodes providing that they (the owners) can always have priority
>>> access to the nodes. The four nodes are well provisioned (1
>>> TByte memory each plus 2 GRID K2 graphics cards) and so there is
>>> no need to worry about preemption. In fact I'm happy for the
>>> nodes to be used as well as possible by all users. It's just
>>> that jobs from the owners must take priority if resources are
>>> scarce.
>>>
>>>
>>> What is the best way to achieve the above in slurm? I'm planning
>>> to place the nodes in their own partition. The node owners will
>>> have priority access to the nodes in that partition, but will
>>> have no advantage when submitting jobs to the public resources.
>>> Does anyone please have any ideas how to deal with this?
>>>
>>>
>>> Best regards,
>>>
>>> David
>>>
>>>
>>
>> --
>> Marcus Wagner, Dipl.-Inf.
>>
>> IT Center
>> Abteilung: Systeme und Betrieb
>> RWTH Aachen University
>> Seffenter Weg 23
>> 52074 Aachen
>> Tel: +49 241 80-24383
>> Fax: +49 241 80-624383
>> wagner at itc.rwth-aachen.de <mailto:wagner at itc.rwth-aachen.de>
>> www.itc.rwth-aachen.de <http://www.itc.rwth-aachen.de>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190219/0d21f30d/attachment.html>
More information about the slurm-users
mailing list