<div dir="ltr"><div>My understanding is that with PreemptMode=requeue, the running scavenger job processes on the node will be killed, but the job will be placed back int he queue (assuming the job's specific parameters allow this.  A job can have a --no-requeue flag set, in which case I assume it behaves the same as PreemptMode=cancel).</div><div><br></div><div>When a job which has been requeued starts up a second (or Nth time), I believe Slurm basically just reruns the job script.  If the job did not do any checkpointing, this means the job starts from the very beginning.  If the job does checkpointing in some fashion, then depending on how the checkpointing was implemented and the cluster environment, the script might or might not have to check for the existence of checkpointing data in order to resume at the last checkpoint. </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Mar 1, 2019 at 7:24 AM david baker <<a href="mailto:djbaker12@gmail.com">djbaker12@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>Hello,</div><div><br></div><div>Following up on implementing preemption in Slurm. Thank you again for all the advice. After a short break I've been able to run some basic experiments. Initially, I have kept things very simple and made the following changes in my slurm.conf...</div><div><br></div><div><div># Premption settings</div><div>PreemptType=preempt/partition_prio</div><div>PreemptMode=requeue</div></div><div><br></div><div>PartitionName=relgroup nodes=red[465-470] ExclusiveUser=YES MaxCPUsPerNode=40 DefaultTime=02:00:00 MaxTime=60:00:00 QOS=relgroup State=UP AllowAccounts=relgroup Priority=10 PreemptMode=off<br></div><div><br></div><div><div># Scavenger partition</div><div>PartitionName=scavenger nodes=red[465-470] ExclusiveUser=YES MaxCPUsPerNode=40 DefaultTime=00:15:00 MaxTime=02:00:00 QOS=scavenger State=UP AllowGroups=jfAccessToIridis5 PreemptMode=requeue</div></div><div><br></div><div>The nodes in the relgroup queue are owned by the General Relativity group and, of course, they have priority to these nodes. The general population can scavenge these nodes via the scavenger queue. When I use "preemptmode=cancel" I'm happy that the relgroup jobs can preempt the scavenger jobs (and the scavenger jobs are cancelled). When I set the preempt mode to "requeue" I see that the scavenger jobs are still cancelled/killed. Have I missed an important configuration change or is it that lower priority jobs will always be killed and not re-queued?</div><div><br></div><div>Could someone please advise me on this issue? Also I'm wondering if I really understand the "requeue" option. Does that mean re-queued and run from the beginning or run from the current state (needing check pointing)?</div><div><br></div><div>Best regards,</div><div>David</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Feb 19, 2019 at 2:15 PM Prentice Bisbal <<a href="mailto:pbisbal@pppl.gov" target="_blank">pbisbal@pppl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div bgcolor="#FFFFFF">

    <p>I just set this up a couple of weeks ago myself. Creating two

      partitions is definitely the way to go. I created one partition,

      "general" for normal, general-access jobs, and another,

      "interruptible" for general-access jobs that can be interrupted,

      and then set PriorityTier accordingly in my slurm.conf file (Node

      names omitted for clarity/brevity). <br>

    </p>

    <p>PartitionName=general Nodes=... MaxTime=48:00:00 State=Up

      PriorityTier=10 QOS=general<br>

      PartitionName=interruptible Nodes=... MaxTime=48:00:00 State=Up

      PriorityTier=1 QOS=interruptible</p>

    <p>I then set PreemptMode=Requeue, because I'd rather have jobs

      requeued than suspended. And it's been working great. There are

      few other settings I had to change. The best documentation for all

      the settings you need to change is

      <a class="gmail-m_-7597727086525980940gmail-m_-1001132135666237498moz-txt-link-freetext" href="https://slurm.schedmd.com/preempt.html" target="_blank">https://slurm.schedmd.com/preempt.html</a></p>

    <p>Everything has been working exactly as desired and advertised. My

      users who needed the ability to run low-priority, long-running

      jobs are very happy. <br>

    </p>

    <p>The one caveat is that jobs that will be killed and requeued need

      to support checkpoint/restart. So when this becomes a production

      thing, users are going to have to acknowledge that they will only

      use this partition for jobs that have some sort of

      checkpoint/restart capability. <br>

    </p>

    <pre class="gmail-m_-7597727086525980940gmail-m_-1001132135666237498moz-signature" cols="72">Prentice </pre>

    <div class="gmail-m_-7597727086525980940gmail-m_-1001132135666237498moz-cite-prefix">On 2/15/19 11:56 AM, david baker wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">Hi Paul, Marcus,

        <div><br>

        </div>

        <div>Thank you for your replies. Using partition priority all

          makes sense. I was thinking of doing something similar with a

          set of nodes purchased by another group. That is, having a

          private high priority partition and a lower priority

          "scavenger" partition for the public. In this case scavenger

          jobs will get killed when preempted. </div>

        <div><br>

        </div>

        <div>In the present case , I did wonder if it would be possible

          to do something with just a single partition -- hence my

          question.Your replies have convinced me that two partitions

          will work -- with preemption leading to re-queued jobs. </div>

        <div><br>

        </div>

        <div>Best regards,</div>

        <div>David </div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Fri, Feb 15, 2019 at 3:09

          PM Paul Edmon <<a href="mailto:pedmon@cfa.harvard.edu" target="_blank">pedmon@cfa.harvard.edu</a>> wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div bgcolor="#FFFFFF">

            <p>Yup, PriorityTier is what we use to do exactly that

              here.  That said unless you turn on preemption jobs may

              still pend if there is no space.  We run with REQUEUE on

              which has worked well.</p>

            <p><br>

            </p>

            <p>-Paul Edmon-</p>

            <p><br>

            </p>

            <div class="gmail-m_-7597727086525980940gmail-m_-1001132135666237498gmail-m_8153567423438616633moz-cite-prefix">On

              2/15/19 7:19 AM, Marcus Wagner wrote:<br>

            </div>

            <blockquote type="cite"> Hi David,<br>

              <br>

              as far as I know, you can use the PriorityTier (partition

              parameter) to achieve this. According to the manpages (if

              I remember right) jobs from higher priority tier

              partitions have precedence over jobs from lower priority

              tier partitions, without taking the normal fairshare

              priority into consideration.<br>

              <br>

              Best<br>

              Marcus<br>

              <br>

              <div class="gmail-m_-7597727086525980940gmail-m_-1001132135666237498gmail-m_8153567423438616633moz-cite-prefix">On

                2/15/19 10:07 AM, David Baker wrote:<br>

              </div>

              <blockquote type="cite">

                <div id="gmail-m_-7597727086525980940gmail-m_-1001132135666237498gmail-m_8153567423438616633divtagdefaultwrapper" dir="ltr">

                  <p style="margin-top:0px;margin-bottom:0px">Hello.</p>

                  <p style="margin-top:0px;margin-bottom:0px"><br>

                  </p>

                  <p style="margin-top:0px;margin-bottom:0px">We have a

                    small set of compute nodes owned by a group. The

                    group has agreed that the rest of the HPC community

                    can use these nodes providing that they (the owners)

                    can always have priority access to the nodes. The

                    four nodes are well provisioned (1 TByte memory each

                    plus 2 GRID K2 graphics cards) and so there is no

                    need to worry about preemption. In fact I'm happy

                    for the nodes to be used as well as possible by all

                    users. It's just that jobs from the owners must take

                    priority if resources are scarce.  </p>

                  <p style="margin-top:0px;margin-bottom:0px"><br>

                  </p>

                  <p style="margin-top:0px;margin-bottom:0px">What is

                    the best way to achieve the above in slurm? I'm

                    planning to place the nodes in their own partition.

                    The node owners will have priority access to the

                    nodes in that partition, but will have no advantage

                    when submitting jobs to the public resources. Does

                    anyone please have any ideas how to deal with this?</p>

                  <p style="margin-top:0px;margin-bottom:0px"><br>

                  </p>

                  <p style="margin-top:0px;margin-bottom:0px">Best

                    regards,</p>

                  <p style="margin-top:0px;margin-bottom:0px">David</p>

                  <p style="margin-top:0px;margin-bottom:0px"><br>

                  </p>

                </div>

              </blockquote>

              <br>

              <pre class="gmail-m_-7597727086525980940gmail-m_-1001132135666237498gmail-m_8153567423438616633moz-signature" cols="72">-- 

Marcus Wagner, Dipl.-Inf.

IT Center

Abteilung: Systeme und Betrieb

RWTH Aachen University

Seffenter Weg 23

52074 Aachen

Tel: +49 241 80-24383

Fax: +49 241 80-624383

<a class="gmail-m_-7597727086525980940gmail-m_-1001132135666237498gmail-m_8153567423438616633moz-txt-link-abbreviated" href="mailto:wagner@itc.rwth-aachen.de" target="_blank">wagner@itc.rwth-aachen.de</a>

<a class="gmail-m_-7597727086525980940gmail-m_-1001132135666237498gmail-m_8153567423438616633moz-txt-link-abbreviated" href="http://www.itc.rwth-aachen.de" target="_blank">www.itc.rwth-aachen.de</a>

</pre>

            </blockquote>

          </div>

        </blockquote>

      </div>

    </blockquote>

  </div>

</blockquote></div></div></div></div></div>

</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr">Tom Payerle <br>DIT-ACIGS/Mid-Atlantic Crossroads        <a href="mailto:payerle@umd.edu" target="_blank">payerle@umd.edu</a><br></div><div>5825 University Research Park               (301) 405-6135<br></div><div dir="ltr">University of Maryland<br>College Park, MD 20740-3831<br></div></div></div></div></div></div>