[slurm-users] Priority access for a group of users

Fri Mar 1 15:04:14 UTC 2019

My understanding is that with PreemptMode=requeue, the running scavenger
job processes on the node will be killed, but the job will be placed back
int he queue (assuming the job's specific parameters allow this.  A job can
have a --no-requeue flag set, in which case I assume it behaves the same as
PreemptMode=cancel).

When a job which has been requeued starts up a second (or Nth time), I
believe Slurm basically just reruns the job script.  If the job did not do
any checkpointing, this means the job starts from the very beginning.  If
the job does checkpointing in some fashion, then depending on how the
checkpointing was implemented and the cluster environment, the script might
or might not have to check for the existence of checkpointing data in order
to resume at the last checkpoint.

On Fri, Mar 1, 2019 at 7:24 AM david baker <djbaker12 at gmail.com> wrote:

> Hello,
>
> Following up on implementing preemption in Slurm. Thank you again for all
> the advice. After a short break I've been able to run some basic
> experiments. Initially, I have kept things very simple and made the
> following changes in my slurm.conf...
>
> # Premption settings
> PreemptType=preempt/partition_prio
> PreemptMode=requeue
>
> PartitionName=relgroup nodes=red[465-470] ExclusiveUser=YES
> MaxCPUsPerNode=40 DefaultTime=02:00:00 MaxTime=60:00:00 QOS=relgroup
> State=UP AllowAccounts=relgroup Priority=10 PreemptMode=off
>
> # Scavenger partition
> PartitionName=scavenger nodes=red[465-470] ExclusiveUser=YES
> MaxCPUsPerNode=40 DefaultTime=00:15:00 MaxTime=02:00:00 QOS=scavenger
> State=UP AllowGroups=jfAccessToIridis5 PreemptMode=requeue
>
> The nodes in the relgroup queue are owned by the General Relativity group
> and, of course, they have priority to these nodes. The general population
> can scavenge these nodes via the scavenger queue. When I use
> "preemptmode=cancel" I'm happy that the relgroup jobs can preempt the
> scavenger jobs (and the scavenger jobs are cancelled). When I set the
> preempt mode to "requeue" I see that the scavenger jobs are still
> cancelled/killed. Have I missed an important configuration change or is it
> that lower priority jobs will always be killed and not re-queued?
>
> Could someone please advise me on this issue? Also I'm wondering if I
> really understand the "requeue" option. Does that mean re-queued and run
> from the beginning or run from the current state (needing check pointing)?
>
> Best regards,
> David
>
> On Tue, Feb 19, 2019 at 2:15 PM Prentice Bisbal <pbisbal at pppl.gov> wrote:
>
>> I just set this up a couple of weeks ago myself. Creating two partitions
>> is definitely the way to go. I created one partition, "general" for normal,
>> general-access jobs, and another, "interruptible" for general-access jobs
>> that can be interrupted, and then set PriorityTier accordingly in my
>> slurm.conf file (Node names omitted for clarity/brevity).
>>
>> PartitionName=general Nodes=... MaxTime=48:00:00 State=Up PriorityTier=10
>> QOS=general
>> PartitionName=interruptible Nodes=... MaxTime=48:00:00 State=Up
>> PriorityTier=1 QOS=interruptible
>>
>> I then set PreemptMode=Requeue, because I'd rather have jobs requeued
>> than suspended. And it's been working great. There are few other settings I
>> had to change. The best documentation for all the settings you need to
>> change is https://slurm.schedmd.com/preempt.html
>>
>> Everything has been working exactly as desired and advertised. My users
>> who needed the ability to run low-priority, long-running jobs are very
>> happy.
>>
>> The one caveat is that jobs that will be killed and requeued need to
>> support checkpoint/restart. So when this becomes a production thing, users
>> are going to have to acknowledge that they will only use this partition for
>> jobs that have some sort of checkpoint/restart capability.
>>
>> Prentice
>>
>> On 2/15/19 11:56 AM, david baker wrote:
>>
>> Hi Paul, Marcus,
>>
>> Thank you for your replies. Using partition priority all makes sense. I
>> was thinking of doing something similar with a set of nodes purchased by
>> another group. That is, having a private high priority partition and a
>> lower priority "scavenger" partition for the public. In this case scavenger
>> jobs will get killed when preempted.
>>
>> In the present case , I did wonder if it would be possible to do
>> something with just a single partition -- hence my question.Your replies
>> have convinced me that two partitions will work -- with preemption leading
>> to re-queued jobs.
>>
>> Best regards,
>> David
>>
>> On Fri, Feb 15, 2019 at 3:09 PM Paul Edmon <pedmon at cfa.harvard.edu>
>> wrote:
>>
>>> Yup, PriorityTier is what we use to do exactly that here.  That said
>>> unless you turn on preemption jobs may still pend if there is no space.  We
>>> run with REQUEUE on which has worked well.
>>>
>>>
>>> -Paul Edmon-
>>>
>>>
>>> On 2/15/19 7:19 AM, Marcus Wagner wrote:
>>>
>>> Hi David,
>>>
>>> as far as I know, you can use the PriorityTier (partition parameter) to
>>> achieve this. According to the manpages (if I remember right) jobs from
>>> higher priority tier partitions have precedence over jobs from lower
>>> priority tier partitions, without taking the normal fairshare priority into
>>> consideration.
>>>
>>> Best
>>> Marcus
>>>
>>> On 2/15/19 10:07 AM, David Baker wrote:
>>>
>>> Hello.
>>>
>>>
>>> We have a small set of compute nodes owned by a group. The group has
>>> agreed that the rest of the HPC community can use these nodes providing
>>> that they (the owners) can always have priority access to the nodes. The
>>> four nodes are well provisioned (1 TByte memory each plus 2 GRID K2
>>> graphics cards) and so there is no need to worry about preemption. In fact
>>> I'm happy for the nodes to be used as well as possible by all users. It's
>>> just that jobs from the owners must take priority if resources are scarce.
>>>
>>>
>>> What is the best way to achieve the above in slurm? I'm planning to
>>> place the nodes in their own partition. The node owners will have priority
>>> access to the nodes in that partition, but will have no advantage when
>>> submitting jobs to the public resources. Does anyone please have any ideas
>>> how to deal with this?
>>>
>>>
>>> Best regards,
>>>
>>> David
>>>
>>>
>>>
>>> --
>>> Marcus Wagner, Dipl.-Inf.
>>>
>>> IT Center
>>> Abteilung: Systeme und Betrieb
>>> RWTH Aachen University
>>> Seffenter Weg 23
>>> 52074 Aachen
>>> Tel: +49 241 80-24383
>>> Fax: +49 241 80-624383wagner at itc.rwth-aachen.dewww.itc.rwth-aachen.de
>>>
>>>

-- 
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads        payerle at umd.edu
5825 University Research Park               (301) 405-6135
University of Maryland
College Park, MD 20740-3831
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190301/6da8a24c/attachment.html>