[slurm-users] Priority access for a group of users

Mon Mar 4 16:51:09 UTC 2019

Hello,

Thank you for reminding me about the sbatch "--requeue" option. When I
submit test jobs using this option the preemption and subsequent restart of
a job works as expected. I've also played around with "preemptmode=suspend"
and that also works, however I suspect we won't use that on these
"diskless" nodes.

As I note I can scavenge resources and preempt jobs myself (I am a member
of the "relgroup" and the general public). That is..

            347104 scavenger    myjob     djb1 PD       0:00      1
(Resources)
            347105  relgroup    myjob     djb1  R      17:00      1 red465

On the other hand I do not seem to be able to preempt a job submitted by a
colleague. That is, my colleague submits a job to the scavenger queue, it
starts to run. I then submit a job to the relgroup queue, however that job
fails to preempt my colleague's job and stays in pending status.

Does anyone understand what might be wrong, please?

Best regards,
David

On Fri, Mar 1, 2019 at 2:47 PM Antony Cleave <antony.cleave at gmail.com>
wrote:

> I have always assumed that cancel just kills the job whereas requeue will
> cancel and then start from the beginning. I know that requeue does this. I
> never tried cancel.
>
> I'm a fan of the suspend mode myself but that is dependent on users not
> asking for all the ram by default. If you can educate the users then this
> works really well as the low priority job stays in ram in suspended mode
> while the high priority job completes and then the low priority job
> continues from where it stopped. No checkpoints and no killing.
>
> Antony
>
>
>
> On Fri, 1 Mar 2019, 12:23 david baker, <djbaker12 at gmail.com> wrote:
>
>> Hello,
>>
>> Following up on implementing preemption in Slurm. Thank you again for all
>> the advice. After a short break I've been able to run some basic
>> experiments. Initially, I have kept things very simple and made the
>> following changes in my slurm.conf...
>>
>> # Premption settings
>> PreemptType=preempt/partition_prio
>> PreemptMode=requeue
>>
>> PartitionName=relgroup nodes=red[465-470] ExclusiveUser=YES
>> MaxCPUsPerNode=40 DefaultTime=02:00:00 MaxTime=60:00:00 QOS=relgroup
>> State=UP AllowAccounts=relgroup Priority=10 PreemptMode=off
>>
>> # Scavenger partition
>> PartitionName=scavenger nodes=red[465-470] ExclusiveUser=YES
>> MaxCPUsPerNode=40 DefaultTime=00:15:00 MaxTime=02:00:00 QOS=scavenger
>> State=UP AllowGroups=jfAccessToIridis5 PreemptMode=requeue
>>
>> The nodes in the relgroup queue are owned by the General Relativity group
>> and, of course, they have priority to these nodes. The general population
>> can scavenge these nodes via the scavenger queue. When I use
>> "preemptmode=cancel" I'm happy that the relgroup jobs can preempt the
>> scavenger jobs (and the scavenger jobs are cancelled). When I set the
>> preempt mode to "requeue" I see that the scavenger jobs are still
>> cancelled/killed. Have I missed an important configuration change or is it
>> that lower priority jobs will always be killed and not re-queued?
>>
>> Could someone please advise me on this issue? Also I'm wondering if I
>> really understand the "requeue" option. Does that mean re-queued and run
>> from the beginning or run from the current state (needing check pointing)?
>>
>> Best regards,
>> David
>>
>> On Tue, Feb 19, 2019 at 2:15 PM Prentice Bisbal <pbisbal at pppl.gov> wrote:
>>
>>> I just set this up a couple of weeks ago myself. Creating two partitions
>>> is definitely the way to go. I created one partition, "general" for normal,
>>> general-access jobs, and another, "interruptible" for general-access jobs
>>> that can be interrupted, and then set PriorityTier accordingly in my
>>> slurm.conf file (Node names omitted for clarity/brevity).
>>>
>>> PartitionName=general Nodes=... MaxTime=48:00:00 State=Up
>>> PriorityTier=10 QOS=general
>>> PartitionName=interruptible Nodes=... MaxTime=48:00:00 State=Up
>>> PriorityTier=1 QOS=interruptible
>>>
>>> I then set PreemptMode=Requeue, because I'd rather have jobs requeued
>>> than suspended. And it's been working great. There are few other settings I
>>> had to change. The best documentation for all the settings you need to
>>> change is https://slurm.schedmd.com/preempt.html
>>>
>>> Everything has been working exactly as desired and advertised. My users
>>> who needed the ability to run low-priority, long-running jobs are very
>>> happy.
>>>
>>> The one caveat is that jobs that will be killed and requeued need to
>>> support checkpoint/restart. So when this becomes a production thing, users
>>> are going to have to acknowledge that they will only use this partition for
>>> jobs that have some sort of checkpoint/restart capability.
>>>
>>> Prentice
>>>
>>> On 2/15/19 11:56 AM, david baker wrote:
>>>
>>> Hi Paul, Marcus,
>>>
>>> Thank you for your replies. Using partition priority all makes sense. I
>>> was thinking of doing something similar with a set of nodes purchased by
>>> another group. That is, having a private high priority partition and a
>>> lower priority "scavenger" partition for the public. In this case scavenger
>>> jobs will get killed when preempted.
>>>
>>> In the present case , I did wonder if it would be possible to do
>>> something with just a single partition -- hence my question.Your replies
>>> have convinced me that two partitions will work -- with preemption leading
>>> to re-queued jobs.
>>>
>>> Best regards,
>>> David
>>>
>>> On Fri, Feb 15, 2019 at 3:09 PM Paul Edmon <pedmon at cfa.harvard.edu>
>>> wrote:
>>>
>>>> Yup, PriorityTier is what we use to do exactly that here.  That said
>>>> unless you turn on preemption jobs may still pend if there is no space.  We
>>>> run with REQUEUE on which has worked well.
>>>>
>>>>
>>>> -Paul Edmon-
>>>>
>>>>
>>>> On 2/15/19 7:19 AM, Marcus Wagner wrote:
>>>>
>>>> Hi David,
>>>>
>>>> as far as I know, you can use the PriorityTier (partition parameter) to
>>>> achieve this. According to the manpages (if I remember right) jobs from
>>>> higher priority tier partitions have precedence over jobs from lower
>>>> priority tier partitions, without taking the normal fairshare priority into
>>>> consideration.
>>>>
>>>> Best
>>>> Marcus
>>>>
>>>> On 2/15/19 10:07 AM, David Baker wrote:
>>>>
>>>> Hello.
>>>>
>>>>
>>>> We have a small set of compute nodes owned by a group. The group has
>>>> agreed that the rest of the HPC community can use these nodes providing
>>>> that they (the owners) can always have priority access to the nodes. The
>>>> four nodes are well provisioned (1 TByte memory each plus 2 GRID K2
>>>> graphics cards) and so there is no need to worry about preemption. In fact
>>>> I'm happy for the nodes to be used as well as possible by all users. It's
>>>> just that jobs from the owners must take priority if resources are scarce.
>>>>
>>>>
>>>> What is the best way to achieve the above in slurm? I'm planning to
>>>> place the nodes in their own partition. The node owners will have priority
>>>> access to the nodes in that partition, but will have no advantage when
>>>> submitting jobs to the public resources. Does anyone please have any ideas
>>>> how to deal with this?
>>>>
>>>>
>>>> Best regards,
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>> --
>>>> Marcus Wagner, Dipl.-Inf.
>>>>
>>>> IT Center
>>>> Abteilung: Systeme und Betrieb
>>>> RWTH Aachen University
>>>> Seffenter Weg 23
>>>> 52074 Aachen
>>>> Tel: +49 241 80-24383
>>>> Fax: +49 241 80-624383wagner at itc.rwth-aachen.dewww.itc.rwth-aachen.de
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190304/109cb8eb/attachment.html>