<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:monospace">It is likely that your job still does not have enough priority to preempt the scavenge job. Have a look at the output of `sprio` to see the priority of those jobs and what factors are in play. It may be necessary to increase the partition priority or adjust some of the job priority factors to get the behavior you're wanting.</div><div class="gmail_default" style="font-family:monospace"><br></div><div class="gmail_default" style="font-family:monospace"> - Michael</div><input name="virtru-metadata" type="hidden" value="{"email-policy":{"state":"closed","expirationUnit":"days","disableCopyPaste":false,"disablePrint":false,"disableForwarding":false,"enableNoauth":false,"expires":false,"isManaged":false},"attachments":{},"compose-id":"14","compose-window":{"secure":false}}"></div><br><div class="gmail_quote" style=""><div dir="ltr" class="gmail_attr">On Mon, Mar 4, 2019 at 8:54 AM david baker <<a href="mailto:djbaker12@gmail.com">djbaker12@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Hello,<div><br></div><div>Thank you for reminding me about the sbatch "--requeue" option. When I submit test jobs using this option the preemption and subsequent restart of a job works as expected. I've also played around with "preemptmode=suspend" and that also works, however I suspect we won't use that on these "diskless" nodes. </div><div><br></div><div>As I note I can scavenge resources and preempt jobs myself (I am a member of the "relgroup" and the general public). That is..</div><div><br></div><div><div> 347104 scavenger myjob djb1 PD 0:00 1 (Resources)</div><div> 347105 relgroup myjob djb1 R 17:00 1 red465</div></div><div><br></div><div>On the other hand I do not seem to be able to preempt a job submitted by a colleague. That is, my colleague submits a job to the scavenger queue, it starts to run. I then submit a job to the relgroup queue, however that job fails to preempt my colleague's job and stays in pending status.</div><div><br></div><div>Does anyone understand what might be wrong, please? </div><div><br></div><div>Best regards,</div><div>David</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Mar 1, 2019 at 2:47 PM Antony Cleave <<a href="mailto:antony.cleave@gmail.com" target="_blank">antony.cleave@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto">I have always assumed that cancel just kills the job whereas requeue will cancel and then start from the beginning. I know that requeue does this. I never tried cancel.<div dir="auto"><br></div><div dir="auto">I'm a fan of the suspend mode myself but that is dependent on users not asking for all the ram by default. If you can educate the users then this works really well as the low priority job stays in ram in suspended mode while the high priority job completes and then the low priority job continues from where it stopped. No checkpoints and no killing.</div><div dir="auto"><div dir="auto"><br></div><div dir="auto">Antony <br><div dir="auto"><br></div><div dir="auto"><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 1 Mar 2019, 12:23 david baker, <<a href="mailto:djbaker12@gmail.com" target="_blank">djbaker12@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>Hello,</div><div><br></div><div>Following up on implementing preemption in Slurm. Thank you again for all the advice. After a short break I've been able to run some basic experiments. Initially, I have kept things very simple and made the following changes in my slurm.conf...</div><div><br></div><div><div># Premption settings</div><div>PreemptType=preempt/partition_prio</div><div>PreemptMode=requeue</div></div><div><br></div><div>PartitionName=relgroup nodes=red[465-470] ExclusiveUser=YES MaxCPUsPerNode=40 DefaultTime=02:00:00 MaxTime=60:00:00 QOS=relgroup State=UP AllowAccounts=relgroup Priority=10 PreemptMode=off<br></div><div><br></div><div><div># Scavenger partition</div><div>PartitionName=scavenger nodes=red[465-470] ExclusiveUser=YES MaxCPUsPerNode=40 DefaultTime=00:15:00 MaxTime=02:00:00 QOS=scavenger State=UP AllowGroups=jfAccessToIridis5 PreemptMode=requeue</div></div><div><br></div><div>The nodes in the relgroup queue are owned by the General Relativity group and, of course, they have priority to these nodes. The general population can scavenge these nodes via the scavenger queue. When I use "preemptmode=cancel" I'm happy that the relgroup jobs can preempt the scavenger jobs (and the scavenger jobs are cancelled). When I set the preempt mode to "requeue" I see that the scavenger jobs are still cancelled/killed. Have I missed an important configuration change or is it that lower priority jobs will always be killed and not re-queued?</div><div><br></div><div>Could someone please advise me on this issue? Also I'm wondering if I really understand the "requeue" option. Does that mean re-queued and run from the beginning or run from the current state (needing check pointing)?</div><div><br></div><div>Best regards,</div><div>David</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Feb 19, 2019 at 2:15 PM Prentice Bisbal <<a href="mailto:pbisbal@pppl.gov" rel="noreferrer" target="_blank">pbisbal@pppl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>I just set this up a couple of weeks ago myself. Creating two
partitions is definitely the way to go. I created one partition,
"general" for normal, general-access jobs, and another,
"interruptible" for general-access jobs that can be interrupted,
and then set PriorityTier accordingly in my slurm.conf file (Node
names omitted for clarity/brevity). <br>
</p>
<p>PartitionName=general Nodes=... MaxTime=48:00:00 State=Up
PriorityTier=10 QOS=general<br>
PartitionName=interruptible Nodes=... MaxTime=48:00:00 State=Up
PriorityTier=1 QOS=interruptible</p>
<p>I then set PreemptMode=Requeue, because I'd rather have jobs
requeued than suspended. And it's been working great. There are
few other settings I had to change. The best documentation for all
the settings you need to change is
<a class="gmail-m_7221884201063211932gmail-m_1298068336445990039m_-1584200741443832647gmail-m_-1001132135666237498moz-txt-link-freetext" href="https://slurm.schedmd.com/preempt.html" rel="noreferrer" target="_blank">https://slurm.schedmd.com/preempt.html</a></p>
<p>Everything has been working exactly as desired and advertised. My
users who needed the ability to run low-priority, long-running
jobs are very happy. <br>
</p>
<p>The one caveat is that jobs that will be killed and requeued need
to support checkpoint/restart. So when this becomes a production
thing, users are going to have to acknowledge that they will only
use this partition for jobs that have some sort of
checkpoint/restart capability. <br>
</p>
<pre class="gmail-m_7221884201063211932gmail-m_1298068336445990039m_-1584200741443832647gmail-m_-1001132135666237498moz-signature" cols="72">Prentice </pre>
<div class="gmail-m_7221884201063211932gmail-m_1298068336445990039m_-1584200741443832647gmail-m_-1001132135666237498moz-cite-prefix">On 2/15/19 11:56 AM, david baker wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Paul, Marcus,
<div><br>
</div>
<div>Thank you for your replies. Using partition priority all
makes sense. I was thinking of doing something similar with a
set of nodes purchased by another group. That is, having a
private high priority partition and a lower priority
"scavenger" partition for the public. In this case scavenger
jobs will get killed when preempted. </div>
<div><br>
</div>
<div>In the present case , I did wonder if it would be possible
to do something with just a single partition -- hence my
question.Your replies have convinced me that two partitions
will work -- with preemption leading to re-queued jobs. </div>
<div><br>
</div>
<div>Best regards,</div>
<div>David </div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Feb 15, 2019 at 3:09
PM Paul Edmon <<a href="mailto:pedmon@cfa.harvard.edu" rel="noreferrer" target="_blank">pedmon@cfa.harvard.edu</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>Yup, PriorityTier is what we use to do exactly that
here. That said unless you turn on preemption jobs may
still pend if there is no space. We run with REQUEUE on
which has worked well.</p>
<p><br>
</p>
<p>-Paul Edmon-</p>
<p><br>
</p>
<div class="gmail-m_7221884201063211932gmail-m_1298068336445990039m_-1584200741443832647gmail-m_-1001132135666237498gmail-m_8153567423438616633moz-cite-prefix">On
2/15/19 7:19 AM, Marcus Wagner wrote:<br>
</div>
<blockquote type="cite"> Hi David,<br>
<br>
as far as I know, you can use the PriorityTier (partition
parameter) to achieve this. According to the manpages (if
I remember right) jobs from higher priority tier
partitions have precedence over jobs from lower priority
tier partitions, without taking the normal fairshare
priority into consideration.<br>
<br>
Best<br>
Marcus<br>
<br>
<div class="gmail-m_7221884201063211932gmail-m_1298068336445990039m_-1584200741443832647gmail-m_-1001132135666237498gmail-m_8153567423438616633moz-cite-prefix">On
2/15/19 10:07 AM, David Baker wrote:<br>
</div>
<blockquote type="cite">
<div id="gmail-m_7221884201063211932gmail-m_1298068336445990039m_-1584200741443832647gmail-m_-1001132135666237498gmail-m_8153567423438616633divtagdefaultwrapper" dir="ltr">
<p style="margin-top:0px;margin-bottom:0px">Hello.</p>
<p style="margin-top:0px;margin-bottom:0px"><br>
</p>
<p style="margin-top:0px;margin-bottom:0px">We have a
small set of compute nodes owned by a group. The
group has agreed that the rest of the HPC community
can use these nodes providing that they (the owners)
can always have priority access to the nodes. The
four nodes are well provisioned (1 TByte memory each
plus 2 GRID K2 graphics cards) and so there is no
need to worry about preemption. In fact I'm happy
for the nodes to be used as well as possible by all
users. It's just that jobs from the owners must take
priority if resources are scarce. </p>
<p style="margin-top:0px;margin-bottom:0px"><br>
</p>
<p style="margin-top:0px;margin-bottom:0px">What is
the best way to achieve the above in slurm? I'm
planning to place the nodes in their own partition.
The node owners will have priority access to the
nodes in that partition, but will have no advantage
when submitting jobs to the public resources. Does
anyone please have any ideas how to deal with this?</p>
<p style="margin-top:0px;margin-bottom:0px"><br>
</p>
<p style="margin-top:0px;margin-bottom:0px">Best
regards,</p>
<p style="margin-top:0px;margin-bottom:0px">David</p>
<p style="margin-top:0px;margin-bottom:0px"><br>
</p>
</div>
</blockquote>
<br>
<pre class="gmail-m_7221884201063211932gmail-m_1298068336445990039m_-1584200741443832647gmail-m_-1001132135666237498gmail-m_8153567423438616633moz-signature" cols="72">--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
<a class="gmail-m_7221884201063211932gmail-m_1298068336445990039m_-1584200741443832647gmail-m_-1001132135666237498gmail-m_8153567423438616633moz-txt-link-abbreviated" href="mailto:wagner@itc.rwth-aachen.de" rel="noreferrer" target="_blank">wagner@itc.rwth-aachen.de</a>
<a class="gmail-m_7221884201063211932gmail-m_1298068336445990039m_-1584200741443832647gmail-m_-1001132135666237498gmail-m_8153567423438616633moz-txt-link-abbreviated" href="http://www.itc.rwth-aachen.de" rel="noreferrer" target="_blank">www.itc.rwth-aachen.de</a>
</pre>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote></div></div></div></div></div>
</blockquote></div>
</blockquote></div>
</blockquote></div></div>