[slurm-users] Priority access for a group of users

Fri Mar 1 15:43:15 UTC 2019

Along those lines, there is the slurm.conf setting for _JobRequeue_ which
controls the default behavior for jobs' ability to be re-queued.

 - Michael

On Fri, Mar 1, 2019 at 7:07 AM Thomas M. Payerle <payerle at umd.edu> wrote:

> My understanding is that with PreemptMode=requeue, the running scavenger
> job processes on the node will be killed, but the job will be placed back
> int he queue (assuming the job's specific parameters allow this.  A job can
> have a --no-requeue flag set, in which case I assume it behaves the same as
> PreemptMode=cancel).
>
> When a job which has been requeued starts up a second (or Nth time), I
> believe Slurm basically just reruns the job script.  If the job did not do
> any checkpointing, this means the job starts from the very beginning.  If
> the job does checkpointing in some fashion, then depending on how the
> checkpointing was implemented and the cluster environment, the script might
> or might not have to check for the existence of checkpointing data in order
> to resume at the last checkpoint.
>
> On Fri, Mar 1, 2019 at 7:24 AM david baker <djbaker12 at gmail.com> wrote:
>
>> Hello,
>>
>> Following up on implementing preemption in Slurm. Thank you again for all
>> the advice. After a short break I've been able to run some basic
>> experiments. Initially, I have kept things very simple and made the
>> following changes in my slurm.conf...
>>
>> # Premption settings
>> PreemptType=preempt/partition_prio
>> PreemptMode=requeue
>>
>> PartitionName=relgroup nodes=red[465-470] ExclusiveUser=YES
>> MaxCPUsPerNode=40 DefaultTime=02:00:00 MaxTime=60:00:00 QOS=relgroup
>> State=UP AllowAccounts=relgroup Priority=10 PreemptMode=off
>>
>> # Scavenger partition
>> PartitionName=scavenger nodes=red[465-470] ExclusiveUser=YES
>> MaxCPUsPerNode=40 DefaultTime=00:15:00 MaxTime=02:00:00 QOS=scavenger
>> State=UP AllowGroups=jfAccessToIridis5 PreemptMode=requeue
>>
>> The nodes in the relgroup queue are owned by the General Relativity group
>> and, of course, they have priority to these nodes. The general population
>> can scavenge these nodes via the scavenger queue. When I use
>> "preemptmode=cancel" I'm happy that the relgroup jobs can preempt the
>> scavenger jobs (and the scavenger jobs are cancelled). When I set the
>> preempt mode to "requeue" I see that the scavenger jobs are still
>> cancelled/killed. Have I missed an important configuration change or is it
>> that lower priority jobs will always be killed and not re-queued?
>>
>> Could someone please advise me on this issue? Also I'm wondering if I
>> really understand the "requeue" option. Does that mean re-queued and run
>> from the beginning or run from the current state (needing check pointing)?
>>
>> Best regards,
>> David
>>
>> On Tue, Feb 19, 2019 at 2:15 PM Prentice Bisbal <pbisbal at pppl.gov> wrote:
>>
>>> I just set this up a couple of weeks ago myself. Creating two partitions
>>> is definitely the way to go. I created one partition, "general" for normal,
>>> general-access jobs, and another, "interruptible" for general-access jobs
>>> that can be interrupted, and then set PriorityTier accordingly in my
>>> slurm.conf file (Node names omitted for clarity/brevity).
>>>
>>> PartitionName=general Nodes=... MaxTime=48:00:00 State=Up
>>> PriorityTier=10 QOS=general
>>> PartitionName=interruptible Nodes=... MaxTime=48:00:00 State=Up
>>> PriorityTier=1 QOS=interruptible
>>>
>>> I then set PreemptMode=Requeue, because I'd rather have jobs requeued
>>> than suspended. And it's been working great. There are few other settings I
>>> had to change. The best documentation for all the settings you need to
>>> change is https://slurm.schedmd.com/preempt.html
>>>
>>> Everything has been working exactly as desired and advertised. My users
>>> who needed the ability to run low-priority, long-running jobs are very
>>> happy.
>>>
>>> The one caveat is that jobs that will be killed and requeued need to
>>> support checkpoint/restart. So when this becomes a production thing, users
>>> are going to have to acknowledge that they will only use this partition for
>>> jobs that have some sort of checkpoint/restart capability.
>>>
>>> Prentice
>>>
>>> On 2/15/19 11:56 AM, david baker wrote:
>>>
>>> Hi Paul, Marcus,
>>>
>>> Thank you for your replies. Using partition priority all makes sense. I
>>> was thinking of doing something similar with a set of nodes purchased by
>>> another group. That is, having a private high priority partition and a
>>> lower priority "scavenger" partition for the public. In this case scavenger
>>> jobs will get killed when preempted.
>>>
>>> In the present case , I did wonder if it would be possible to do
>>> something with just a single partition -- hence my question.Your replies
>>> have convinced me that two partitions will work -- with preemption leading
>>> to re-queued jobs.
>>>
>>> Best regards,
>>> David
>>>
>>> On Fri, Feb 15, 2019 at 3:09 PM Paul Edmon <pedmon at cfa.harvard.edu>
>>> wrote:
>>>
>>>> Yup, PriorityTier is what we use to do exactly that here.  That said
>>>> unless you turn on preemption jobs may still pend if there is no space.  We
>>>> run with REQUEUE on which has worked well.
>>>>
>>>>
>>>> -Paul Edmon-
>>>>
>>>>
>>>> On 2/15/19 7:19 AM, Marcus Wagner wrote:
>>>>
>>>> Hi David,
>>>>
>>>> as far as I know, you can use the PriorityTier (partition parameter) to
>>>> achieve this. According to the manpages (if I remember right) jobs from
>>>> higher priority tier partitions have precedence over jobs from lower
>>>> priority tier partitions, without taking the normal fairshare priority into
>>>> consideration.
>>>>
>>>> Best
>>>> Marcus
>>>>
>>>> On 2/15/19 10:07 AM, David Baker wrote:
>>>>
>>>> Hello.
>>>>
>>>>
>>>> We have a small set of compute nodes owned by a group. The group has
>>>> agreed that the rest of the HPC community can use these nodes providing
>>>> that they (the owners) can always have priority access to the nodes. The
>>>> four nodes are well provisioned (1 TByte memory each plus 2 GRID K2
>>>> graphics cards) and so there is no need to worry about preemption. In fact
>>>> I'm happy for the nodes to be used as well as possible by all users. It's
>>>> just that jobs from the owners must take priority if resources are scarce.
>>>>
>>>>
>>>> What is the best way to achieve the above in slurm? I'm planning to
>>>> place the nodes in their own partition. The node owners will have priority
>>>> access to the nodes in that partition, but will have no advantage when
>>>> submitting jobs to the public resources. Does anyone please have any ideas
>>>> how to deal with this?
>>>>
>>>>
>>>> Best regards,
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>> --
>>>> Marcus Wagner, Dipl.-Inf.
>>>>
>>>> IT Center
>>>> Abteilung: Systeme und Betrieb
>>>> RWTH Aachen University
>>>> Seffenter Weg 23
>>>> 52074 Aachen
>>>> Tel: +49 241 80-24383
>>>> Fax: +49 241 80-624383wagner at itc.rwth-aachen.dewww.itc.rwth-aachen.de
>>>>
>>>>
>
> --
> Tom Payerle
> DIT-ACIGS/Mid-Atlantic Crossroads        payerle at umd.edu
> 5825 University Research Park               (301) 405-6135
> University of Maryland
> College Park, MD 20740-3831
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190301/66eed69b/attachment-0001.html>