[slurm-users] Priority access for a group of users

Fri Mar 1 12:20:32 UTC 2019

Hello,

Following up on implementing preemption in Slurm. Thank you again for all
the advice. After a short break I've been able to run some basic
experiments. Initially, I have kept things very simple and made the
following changes in my slurm.conf...

# Premption settings
PreemptType=preempt/partition_prio
PreemptMode=requeue

PartitionName=relgroup nodes=red[465-470] ExclusiveUser=YES
MaxCPUsPerNode=40 DefaultTime=02:00:00 MaxTime=60:00:00 QOS=relgroup
State=UP AllowAccounts=relgroup Priority=10 PreemptMode=off

# Scavenger partition
PartitionName=scavenger nodes=red[465-470] ExclusiveUser=YES
MaxCPUsPerNode=40 DefaultTime=00:15:00 MaxTime=02:00:00 QOS=scavenger
State=UP AllowGroups=jfAccessToIridis5 PreemptMode=requeue

The nodes in the relgroup queue are owned by the General Relativity group
and, of course, they have priority to these nodes. The general population
can scavenge these nodes via the scavenger queue. When I use
"preemptmode=cancel" I'm happy that the relgroup jobs can preempt the
scavenger jobs (and the scavenger jobs are cancelled). When I set the
preempt mode to "requeue" I see that the scavenger jobs are still
cancelled/killed. Have I missed an important configuration change or is it
that lower priority jobs will always be killed and not re-queued?

Could someone please advise me on this issue? Also I'm wondering if I
really understand the "requeue" option. Does that mean re-queued and run
from the beginning or run from the current state (needing check pointing)?

Best regards,
David

On Tue, Feb 19, 2019 at 2:15 PM Prentice Bisbal <pbisbal at pppl.gov> wrote:

> I just set this up a couple of weeks ago myself. Creating two partitions
> is definitely the way to go. I created one partition, "general" for normal,
> general-access jobs, and another, "interruptible" for general-access jobs
> that can be interrupted, and then set PriorityTier accordingly in my
> slurm.conf file (Node names omitted for clarity/brevity).
>
> PartitionName=general Nodes=... MaxTime=48:00:00 State=Up PriorityTier=10
> QOS=general
> PartitionName=interruptible Nodes=... MaxTime=48:00:00 State=Up
> PriorityTier=1 QOS=interruptible
>
> I then set PreemptMode=Requeue, because I'd rather have jobs requeued than
> suspended. And it's been working great. There are few other settings I had
> to change. The best documentation for all the settings you need to change
> is https://slurm.schedmd.com/preempt.html
>
> Everything has been working exactly as desired and advertised. My users
> who needed the ability to run low-priority, long-running jobs are very
> happy.
>
> The one caveat is that jobs that will be killed and requeued need to
> support checkpoint/restart. So when this becomes a production thing, users
> are going to have to acknowledge that they will only use this partition for
> jobs that have some sort of checkpoint/restart capability.
>
> Prentice
>
> On 2/15/19 11:56 AM, david baker wrote:
>
> Hi Paul, Marcus,
>
> Thank you for your replies. Using partition priority all makes sense. I
> was thinking of doing something similar with a set of nodes purchased by
> another group. That is, having a private high priority partition and a
> lower priority "scavenger" partition for the public. In this case scavenger
> jobs will get killed when preempted.
>
> In the present case , I did wonder if it would be possible to do something
> with just a single partition -- hence my question.Your replies have
> convinced me that two partitions will work -- with preemption leading to
> re-queued jobs.
>
> Best regards,
> David
>
> On Fri, Feb 15, 2019 at 3:09 PM Paul Edmon <pedmon at cfa.harvard.edu> wrote:
>
>> Yup, PriorityTier is what we use to do exactly that here.  That said
>> unless you turn on preemption jobs may still pend if there is no space.  We
>> run with REQUEUE on which has worked well.
>>
>>
>> -Paul Edmon-
>>
>>
>> On 2/15/19 7:19 AM, Marcus Wagner wrote:
>>
>> Hi David,
>>
>> as far as I know, you can use the PriorityTier (partition parameter) to
>> achieve this. According to the manpages (if I remember right) jobs from
>> higher priority tier partitions have precedence over jobs from lower
>> priority tier partitions, without taking the normal fairshare priority into
>> consideration.
>>
>> Best
>> Marcus
>>
>> On 2/15/19 10:07 AM, David Baker wrote:
>>
>> Hello.
>>
>>
>> We have a small set of compute nodes owned by a group. The group has
>> agreed that the rest of the HPC community can use these nodes providing
>> that they (the owners) can always have priority access to the nodes. The
>> four nodes are well provisioned (1 TByte memory each plus 2 GRID K2
>> graphics cards) and so there is no need to worry about preemption. In fact
>> I'm happy for the nodes to be used as well as possible by all users. It's
>> just that jobs from the owners must take priority if resources are scarce.
>>
>>
>> What is the best way to achieve the above in slurm? I'm planning to place
>> the nodes in their own partition. The node owners will have priority access
>> to the nodes in that partition, but will have no advantage when submitting
>> jobs to the public resources. Does anyone please have any ideas how to deal
>> with this?
>>
>>
>> Best regards,
>>
>> David
>>
>>
>>
>> --
>> Marcus Wagner, Dipl.-Inf.
>>
>> IT Center
>> Abteilung: Systeme und Betrieb
>> RWTH Aachen University
>> Seffenter Weg 23
>> 52074 Aachen
>> Tel: +49 241 80-24383
>> Fax: +49 241 80-624383wagner at itc.rwth-aachen.dewww.itc.rwth-aachen.de
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190301/53635af1/attachment.html>