<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<style type="text/css">body p { margin-bottom: 0cm; margin-top: 0pt; } </style>
</head>
<body bidimailui-charset-is-forced="true">
<br>
<div class="moz-cite-prefix">On 29/04/2020 12:00:13, navin
srivastava wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAK8-jZD+7ncg=FuNmvH5An8Ak4men5FjNcdTFvwh+Y5BUctaxA@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">Thanks Daniel.
<div> </div>
<div>All jobs went into run state so unable to provide the
details but definitely will reach out later if we see similar
issue.</div>
<div><br>
</div>
<div>i am more interested to understand the FIFO with Fair
Tree.it will be good if anybody provide some insight on this
combination and also if we will enable the backfilling here
how the behaviour will change.</div>
<div><br>
</div>
<div>what is the role of the Fair tree here?<br>
</div>
</div>
</blockquote>
<p>Fair tree is the algorithm used to calculate the interim
priority, before applying weight, but I think after the halflife
decay.</p>
<p><br>
</p>
<p>To make it simple - fifo without fairshare would assign priority
based only on submission time. With faishare, that naive priority
is adjusted based on prior usage by the applicable entities
(users/departments - accounts).</p>
<p><br>
</p>
<p>Backfill will let you utilize your resources better, since it
will allow "inserting" low priority jobs before higher priority
jobs, provided all jobs have defined wall times, and any inserted
job doesn't affect in any way the start time of a higher priority
job, thus allowing utilization of "holes" when the scheduler waits
for resources to free up, in order to insert some large job.</p>
<p><br>
</p>
<p>Suppose the system is at 60% utilization of cores, and the next
fifo job requires 42% - it will wait until 2% are free so it can
begin, meanwhile not allowing any job to start, even if it would
tke only 30% of the resources (whic are currently free) and would
finish before the 2% are free anyway.</p>
<p>Backfill would allow such job to start, as long as it's wall time
ensures it would finish before the 42% job would've started.</p>
<p><br>
</p>
<p>Fairtree in either case (fifo or backfill) calculates the
priority for each job the same - if the account had used more
resources recently (the halflife decay factor) it would get a
lower priority even though it was submitted earlier than a job
from an account that didn't use any resources recently.</p>
<p><br>
</p>
<p>As can be expected, backtree has to loop over all jobs in the
queue, in order to see if any job can fit out of order. In very
busy/active systems, that can lead to poor response times, unless
tuned correctly in slurm conf - look at SchedulerParameters, all
params starting with bf_ and in particular bf_max_job_test=
,bf_max_time= and bf_continue (but bf_window= can also have some
impact if set too high).<br>
</p>
<p>see the man page at
<a class="moz-txt-link-freetext" href="https://slurm.schedmd.com/slurm.conf.html#OPT_SchedulerParameters">https://slurm.schedmd.com/slurm.conf.html#OPT_SchedulerParameters</a><br>
</p>
<blockquote type="cite"
cite="mid:CAK8-jZD+7ncg=FuNmvH5An8Ak4men5FjNcdTFvwh+Y5BUctaxA@mail.gmail.com">
<div dir="ltr">
<div><br>
</div>
<div>PriorityType=priority/multifactor<br>
</div>
<div>PriorityDecayHalfLife=2<br>
PriorityUsageResetPeriod=DAILY<br>
PriorityWeightFairshare=500000<br>
PriorityFlags=FAIR_TREE<br>
</div>
<div><br>
</div>
<div>Regards<br>
</div>
<div>Navin.</div>
<div><br>
</div>
<div><br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Apr 27, 2020 at 9:37
PM Daniel Letai <<a href="mailto:dani@letai.org.il"
moz-do-not-send="true">dani@letai.org.il</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Are you sure there are enough resources available? The
node is in mixed state, so it's configured for both
partitions - it's possible that earlier lower priority
jobs are already running thus blocking the later jobs,
especially since it's fifo.</p>
<p><br>
</p>
<p>It would really help if you pasted the results of:</p>
<p>squeue</p>
<p>sinfo</p>
<p><br>
</p>
<p>As well as the exact sbatch line, so we can see how many
resources per node are requested.<br>
</p>
<p><br>
</p>
<div>On 26/04/2020 12:00:06, navin srivastava wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Thanks Brian,
<div><br>
</div>
<div>As suggested i gone through document and what i
understood that the fair tree leads to the Fairshare
mechanism and based on that the job should be
scheduling.</div>
<div><br>
</div>
<div>so it mean job scheduling will be based on FIFO but
priority will be decided on the Fairshare. i am not
sure if both conflicts here.if i see the normal jobs
priority is lower than the GPUsmall priority. so
resources are available with gpusmall partition then
it should go. there is no job pend due to gpu
resources. the gpu resources itself not asked with the
job.</div>
<div><br>
</div>
<div>is there any article where i can see how the
fairshare works and which are setting should not be
conflict with this.</div>
<div>According to document it never says that if
fair-share is applied then FIFO should be disabled.<br>
</div>
<div><br>
</div>
<div>Regards</div>
<div>Navin.</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sat, Apr 25, 2020
at 12:47 AM Brian W. Johanson <<a
href="mailto:bjohanso@psc.edu" target="_blank"
moz-do-not-send="true">bjohanso@psc.edu</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div> <br>
If you haven't looked at the man page for
slurm.conf, it will answer most if not all your
questions. <br>
<a href="https://slurm.schedmd.com/slurm.conf.html"
target="_blank" moz-do-not-send="true">https://slurm.schedmd.com/slurm.conf.html</a>
but I would depend on the the manual version that
was distributed with the version you have installed
as options do change.<br>
<br>
There is a ton of information that is tedious to get
through but reading through it multiple times opens
many doors.<br>
<br>
DefaultTime is listed in there as a Partition
option. <br>
If you are scheduling gres/gpu resources, it's quite
possible there are cores available with no
corresponding gpus avail.<br>
<br>
-b<br>
<br>
<div>On 4/24/20 2:49 PM, navin srivastava wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">Thanks Brian.
<div dir="auto"><br>
</div>
<div dir="auto">I need to check the jobs
order. <br>
<div dir="auto"><br>
</div>
<div dir="auto">Is there any way to define
the default timeline of the job if user not
specifying time limit. </div>
<div dir="auto"><br>
</div>
<div dir="auto">Also what does the meaning of
fairtree in priorities in slurm.Conf file. </div>
<div dir="auto"><br>
</div>
<div dir="auto">The set of nodes are different
in partitions.FIFO does not care for any
partitiong. </div>
<div dir="auto">Is it like strict odering
means the job came 1st will go and until it
runs it will not allow others.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Also priorities is high for
gpusmall partition and low for normal jobs
and the nodes of the normal partition is
full but gpusmall cores are available.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Regards <br>
</div>
<div dir="auto">Navin </div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Apr
24, 2020, 23:49 Brian W. Johanson <<a
href="mailto:bjohanso@psc.edu"
target="_blank" moz-do-not-send="true">bjohanso@psc.edu</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div> <tt>Without seeing the jobs in your
queue, I would expect the next job in FIFO
order to be too large to fit in the
current idle resources. <br>
<br>
Configure it to use the backfill
scheduler: </tt><tt><tt>SchedulerType=sched/backfill<br>
<br>
</tt> SchedulerType<br>
Identifies the type of
scheduler to be used. Note the slurmctld
daemon must be restarted for a change in
scheduler type to become effective
(reconfiguring a running daemon has no
effect for this parameter). The scontrol
command can be used to manually change job
priorities if desired. Acceptable values
include:<br>
<br>
sched/backfill<br>
For a backfill
scheduling module to augment the default
FIFO scheduling. Backfill scheduling will
initiate lower-priority jobs if doing so
does not delay the expected initiation
time of any higher priority job.
Effectiveness of backfill scheduling is
dependent upon users specifying job time
limits, otherwise all jobs will have the
same time limit and backfilling is
impossible. Note documentation for the
SchedulerParameters option above. This is
the default configuration.<br>
<br>
sched/builtin<br>
This is the FIFO
scheduler which initiates jobs in priority
order. If any job in the partition can
not be scheduled, no lower priority job in
that partition will be scheduled. An
exception is made for jobs that can not
run due to partition constraints (e.g. the
time limit) or down/drained nodes. In
that case, lower priority jobs can be
initiated and not impact the higher
priority job.<br>
<br>
<br>
<br>
Your partitions are set with
maxtime=INFINITE, if your users are not
specifying a reasonable timelimit to their
jobs, this won't help either.<br>
<br>
<br>
-b<br>
<br>
</tt><br>
<div>On 4/24/20 1:52 PM, navin srivastava
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">In addition to the above
when i see the sprio of both the jobs it
says :-
<div><br>
</div>
<div>for normal queue jobs all jobs
showing the same priority</div>
<div><br>
</div>
<div> JOBID PARTITION PRIORITY
FAIRSHARE<br>
1291352 normal 15789
15789<br>
</div>
<div><br>
</div>
<div>for GPUsmall all jobs showing the
same priority.</div>
<div><br>
</div>
<div> JOBID PARTITION PRIORITY
FAIRSHARE<br>
1291339 GPUsmall 21052
21053<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On
Fri, Apr 24, 2020 at 11:14 PM navin
srivastava <<a
href="mailto:navin.altair@gmail.com"
rel="noreferrer" target="_blank"
moz-do-not-send="true">navin.altair@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Hi Team,<br>
<div><br>
</div>
<div>we are facing some issue in our
environment. The resources are
free but job is going into the
QUEUE state but not running.</div>
<div><br>
</div>
<div>i have attached the
slurm.conf file here.</div>
<div><br>
</div>
<div>scenario:-</div>
<div><br>
</div>
<div>There are job only in the 2
partitions:</div>
<div> 344 jobs are in PD state in
normal partition and the node
belongs from the normal partitions
are full and no more job can run.</div>
<div><br>
</div>
<div>1300 JOBS are in GPUsmall
partition are in queue and enough
CPU is avaiable to execute the
jobs but i see the jobs are not
scheduling on free nodes.</div>
<div><br>
</div>
<div>Rest there are no pend jobs in
any other partition .</div>
<div>eg:-</div>
<div>node status:- node18</div>
<div><br>
</div>
<div>NodeName=node18 Arch=x86_64
CoresPerSocket=18<br>
CPUAlloc=6 CPUErr=0 CPUTot=36
CPULoad=4.07<br>
AvailableFeatures=K2200<br>
ActiveFeatures=K2200<br>
Gres=gpu:2<br>
NodeAddr=node18
NodeHostName=node18 Version=17.11<br>
OS=Linux 4.4.140-94.42-default
#1 SMP Tue Jul 17 07:44:50 UTC
2018 (0b375e4)<br>
RealMemory=1 AllocMem=0
FreeMem=79532 Sockets=2 Boards=1<br>
State=MIXED ThreadsPerCore=1
TmpDisk=0 Weight=1 Owner=N/A
MCS_label=N/A<br>
Partitions=GPUsmall,pm_shared<br>
BootTime=2019-12-10T14:16:37
SlurmdStartTime=2019-12-10T14:24:08<br>
CfgTRES=cpu=36,mem=1M,billing=36<br>
AllocTRES=cpu=6<br>
CapWatts=n/a<br>
CurrentWatts=0 LowestJoules=0
ConsumedJoules=0<br>
ExtSensorsJoules=n/s
ExtSensorsWatts=0
ExtSensorsTemp=n/s<br>
</div>
<div><br>
</div>
<div>node19:-</div>
<div><br>
</div>
<div>NodeName=node19 Arch=x86_64
CoresPerSocket=18<br>
CPUAlloc=16 CPUErr=0 CPUTot=36
CPULoad=15.43<br>
AvailableFeatures=K2200<br>
ActiveFeatures=K2200<br>
Gres=gpu:2<br>
NodeAddr=node19
NodeHostName=node19 Version=17.11<br>
OS=Linux 4.12.14-94.41-default
#1 SMP Wed Oct 31 12:25:04 UTC
2018 (3090901)<br>
RealMemory=1 AllocMem=0
FreeMem=63998 Sockets=2 Boards=1<br>
State=MIXED ThreadsPerCore=1
TmpDisk=0 Weight=1 Owner=N/A
MCS_label=N/A<br>
Partitions=GPUsmall,pm_shared<br>
BootTime=2020-03-12T06:51:54
SlurmdStartTime=2020-03-12T06:53:14<br>
CfgTRES=cpu=36,mem=1M,billing=36<br>
AllocTRES=cpu=16<br>
CapWatts=n/a<br>
CurrentWatts=0 LowestJoules=0
ConsumedJoules=0<br>
ExtSensorsJoules=n/s
ExtSensorsWatts=0
ExtSensorsTemp=n/s<br>
</div>
<div><br>
</div>
<div>could you please help me to
understand what could be the
reason?</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<pre cols="72">--
Regards,
Daniel Letai
+972 (0)505 870 456</pre>
</div>
</blockquote>
</div>
</blockquote>
<pre class="moz-signature" cols="72">--
Regards,
Daniel Letai
+972 (0)505 870 456</pre>
</body>
</html>