<div dir="ltr"><div dir="ltr">Thanks for your suggestions Marcus. I have restarted services, and also been messing with various parameters (probably more than I should). Nothing seems to help.</div><div dir="ltr"><br></div><div>Not ready to upgrade to Slurm 18, so guess I'll have to live with it...</div><div><br></div><div>Best,</div><div>Randy</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Apr 3, 2019 at 1:50 AM Marcus Wagner <<a href="mailto:wagner@itc.rwth-aachen.de">wagner@itc.rwth-aachen.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
Hmm...,<br>
<br>
I'm a bit dazzled, seems to be ok as far as I can tell.<br>
<br>
Did you try to restart slurmctld?<br>
I had a case, where users could not submit to the default partition
anymore, since SLURM told them (if I remember right) <br>
wrong account/partition combination<br>
or something like that.<br>
My first suspicion was my submission script since I changed it
recently, but I could not find any error. scontrol reconfig did not
help. <br>
But everything went well again, after I restarted the slurmctld.<br>
<br>
Might be worth a try.<br>
<br>
<br>
Best<br>
Marcus<br>
<br>
<div class="gmail-m_-2848306032173127196moz-cite-prefix">On 4/2/19 1:24 PM, Randall Radmer
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">Hi Marcus,
<div><br>
</div>
<div>Following jobs are running or pending after I
killed job 100816, which was running on
computelab-134's T4:</div>
<div>
<div>100815 RUNNING computelab-134 gpu:gv100:1
None1</div>
<div>100817 PENDING gpu:gv100:1 Resources1</div>
<div>100818 PENDING gpu:tu104:1 Resources1</div>
</div>
<div><br>
</div>
<div>
<div>$ scontrol -d show node computelab-134</div>
<div>NodeName=computelab-134 Arch=x86_64
CoresPerSocket=6</div>
<div> CPUAlloc=6 CPUErr=0 CPUTot=12 CPULoad=0.00</div>
<div> AvailableFeatures=(null)</div>
<div> ActiveFeatures=(null)</div>
<div> Gres=gpu:gv100:1,gpu:tu104:1</div>
<div> GresDrain=N/A</div>
<div>
GresUsed=gpu:gv100:1(IDX:0),gpu:tu104:0(IDX:N/A)</div>
<div> NodeAddr=computelab-134
NodeHostName=computelab-134 Version=17.11</div>
<div> OS=Linux 4.4.0-143-generic #169-Ubuntu SMP
Thu Feb 7 07:56:38 UTC 2019 </div>
<div> RealMemory=64307 AllocMem=32148
FreeMem=61126 Sockets=2 Boards=1</div>
<div> State=MIXED ThreadsPerCore=1
TmpDisk=404938 Weight=1 Owner=N/A MCS_label=N/A</div>
<div> Partitions=test-backfill </div>
<div> BootTime=2019-03-29T12:09:25
SlurmdStartTime=2019-04-01T11:34:35</div>
<div>
CfgTRES=cpu=12,mem=64307M,billing=12,gres/gpu=2,gres/gpu:gv100=1,gres/gpu:tu104=1</div>
<div>
AllocTRES=cpu=6,mem=32148M,gres/gpu=1,gres/gpu:gv100=1</div>
<div> CapWatts=n/a</div>
<div> CurrentWatts=0 LowestJoules=0
ConsumedJoules=0</div>
<div> ExtSensorsJoules=n/s ExtSensorsWatts=0
ExtSensorsTemp=n/s</div>
</div>
<div><br>
</div>
<div>
<div>$ scontrol -d show job 100815</div>
<div>JobId=100815 JobName=bash</div>
<div> UserId=rradmer(27578) GroupId=hardware(30)
MCS_label=N/A</div>
<div> Priority=1 Nice=0 Account=cag QOS=normal</div>
<div> JobState=RUNNING Reason=None
Dependency=(null)</div>
<div> Requeue=1 Restarts=0 BatchFlag=0 Reboot=0
ExitCode=0:0</div>
<div> DerivedExitCode=0:0</div>
<div> RunTime=00:06:45 TimeLimit=02:00:00
TimeMin=N/A</div>
<div> SubmitTime=2019-04-02T05:13:05
EligibleTime=2019-04-02T05:13:05</div>
<div> StartTime=2019-04-02T05:13:05
EndTime=2019-04-02T07:13:05 Deadline=N/A</div>
<div> PreemptTime=None SuspendTime=None
SecsPreSuspend=0</div>
<div> LastSchedEval=2019-04-02T05:13:05</div>
<div> Partition=test-backfill
AllocNode:Sid=computelab-frontend-02:7873</div>
<div> ReqNodeList=computelab-134
ExcNodeList=(null)</div>
<div> NodeList=computelab-134</div>
<div> BatchHost=computelab-134</div>
<div> NumNodes=1 NumCPUs=6 NumTasks=1
CPUs/Task=6 ReqB:S:C:T=0:0:*:*</div>
<div>
TRES=cpu=6,mem=32148M,node=1,billing=6,gres/gpu=1,gres/gpu:gv100=1</div>
<div> Socks/Node=* NtasksPerN:B:S:C=0:0:*:*
CoreSpec=*</div>
<div> Nodes=computelab-134 CPU_IDs=0-5
Mem=32148 GRES_IDX=gpu:gv100(IDX:0)</div>
<div> MinCPUsNode=6 MinMemoryNode=32148M
MinTmpDiskNode=0</div>
<div> Features=(null) DelayBoot=00:00:00</div>
<div> Gres=gpu:gv100:1 Reservation=(null)</div>
<div> OverSubscribe=OK Contiguous=0
Licenses=(null) Network=(null)</div>
<div> Command=/bin/bash</div>
<div> WorkDir=/home/rradmer</div>
<div> Power=</div>
</div>
<div><br>
</div>
<div>
<div>$ scontrol -d show job 100817</div>
<div>JobId=100817 JobName=bash</div>
<div> UserId=rradmer(27578) GroupId=hardware(30)
MCS_label=N/A</div>
<div> Priority=1 Nice=0 Account=cag QOS=normal</div>
<div> JobState=PENDING Reason=Resources
Dependency=(null)</div>
<div> Requeue=1 Restarts=0 BatchFlag=0 Reboot=0
ExitCode=0:0</div>
<div> DerivedExitCode=0:0</div>
<div> RunTime=00:00:00 TimeLimit=02:00:00
TimeMin=N/A</div>
<div> SubmitTime=2019-04-02T05:13:11
EligibleTime=2019-04-02T05:13:11</div>
<div> StartTime=2019-04-02T07:13:05
EndTime=2019-04-02T09:13:05 Deadline=N/A</div>
<div> PreemptTime=None SuspendTime=None
SecsPreSuspend=0</div>
<div> LastSchedEval=2019-04-02T05:20:44</div>
<div> Partition=test-backfill
AllocNode:Sid=computelab-frontend-03:21736</div>
<div> ReqNodeList=computelab-134
ExcNodeList=(null)</div>
<div> NodeList=(null)
SchedNodeList=computelab-134</div>
<div> NumNodes=1-1 NumCPUs=6 NumTasks=1
CPUs/Task=6 ReqB:S:C:T=0:0:*:*</div>
<div>
TRES=cpu=6,mem=32148M,node=1,gres/gpu=1,gres/gpu:gv100=1</div>
<div> Socks/Node=* NtasksPerN:B:S:C=0:0:*:*
CoreSpec=*</div>
<div> MinCPUsNode=6 MinMemoryNode=32148M
MinTmpDiskNode=0</div>
<div> Features=(null) DelayBoot=00:00:00</div>
<div> Gres=gpu:gv100:1 Reservation=(null)</div>
<div> OverSubscribe=OK Contiguous=0
Licenses=(null) Network=(null)</div>
<div> Command=/bin/bash</div>
<div> WorkDir=/home/rradmer</div>
<div> Power=</div>
</div>
<div><br>
</div>
<div>
<div>$ scontrol -d show job 100818</div>
<div>JobId=100818 JobName=bash</div>
<div> UserId=rradmer(27578) GroupId=hardware(30)
MCS_label=N/A</div>
<div> Priority=1 Nice=0 Account=cag QOS=normal</div>
<div> JobState=PENDING Reason=Resources
Dependency=(null)</div>
<div> Requeue=1 Restarts=0 BatchFlag=0 Reboot=0
ExitCode=0:0</div>
<div> DerivedExitCode=0:0</div>
<div> RunTime=00:00:00 TimeLimit=02:00:00
TimeMin=N/A</div>
<div> SubmitTime=2019-04-02T05:13:12
EligibleTime=2019-04-02T05:13:12</div>
<div> StartTime=2019-04-02T09:13:00
EndTime=2019-04-02T11:13:00 Deadline=N/A</div>
<div> PreemptTime=None SuspendTime=None
SecsPreSuspend=0</div>
<div> LastSchedEval=2019-04-02T05:21:32</div>
<div> Partition=test-backfill
AllocNode:Sid=computelab-frontend-02:12826</div>
<div> ReqNodeList=computelab-134
ExcNodeList=(null)</div>
<div> NodeList=(null)
SchedNodeList=computelab-134</div>
<div> NumNodes=1-1 NumCPUs=6 NumTasks=1
CPUs/Task=6 ReqB:S:C:T=0:0:*:*</div>
<div>
TRES=cpu=6,mem=32148M,node=1,gres/gpu=1,gres/gpu:tu104=1</div>
<div> Socks/Node=* NtasksPerN:B:S:C=0:0:*:*
CoreSpec=*</div>
<div> MinCPUsNode=6 MinMemoryNode=32148M
MinTmpDiskNode=0</div>
<div> Features=(null) DelayBoot=00:00:00</div>
<div> Gres=gpu:tu104:1 Reservation=(null)</div>
<div> OverSubscribe=OK Contiguous=0
Licenses=(null) Network=(null)</div>
<div> Command=/bin/bash</div>
<div> WorkDir=/home/rradmer</div>
<div> Power=</div>
</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Apr 1, 2019 at 11:24
PM Marcus Wagner <<a href="mailto:wagner@itc.rwth-aachen.de" target="_blank">wagner@itc.rwth-aachen.de</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> Dear Randall,<br>
<br>
could you please also provide<br>
<br>
<br>
scontrol -d show node computelab-134<br>
scontrol -d show job 100091<br>
scontrol -d show job 100094<br>
<br>
<br>
Best<br>
Marcus<br>
<br>
<div class="gmail-m_-2848306032173127196gmail-m_-3315689078268393495moz-cite-prefix">On
4/1/19 4:31 PM, Randall Radmer wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><span id="gmail-m_-2848306032173127196gmail-m_-3315689078268393495gmail-docs-internal-guid-cc4ba6cd-7fff-44e2-be99-bccade6b624b">
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">I can’t get backfill to work for a machine with two GPUs (one is a P4 and the other a T4).</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Submitting jobs works as expected: if the GPU I request is free, then my job runs, otherwise it goes into a pending state. But if I have pending jobs for one GPU ahead of pending jobs for the other GPU, I see blocking issues.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">More specifically, I can create a case where I am running a job on each of the GPUs and have a pending job waiting for the P4 followed by a pending job waiting for a T4. I would expect that if I exit the running T4 job, then backfill would start the pending T4 job, even though it has to job ahead of the pending P4 job. This does not happen...</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The following shows my jobs after I exited from a running T4 job, which had ID 100092:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">$ squeue --noheader -u rradmer --Format=jobid,state,gres,nodelist,reason | sed 's/ */ /g' | sort</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">100091 RUNNING gpu:gv100:1 computelab-134 None </span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">100093 PENDING gpu:gv100:1 Resources </span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">100094 PENDING gpu:tu104:1 Resources </span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">I can find no reason why 100094 doesn’t start running (I’ve waited up to an hour, just to make sure).</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">System config info and log snippets shown below.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Thanks much,</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Randy</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Node state corresponding to the squeue command, shown above:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">$ scontrol show node computelab-134 | grep -i [gt]res</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"> Gres=gpu:gv100:1,gpu:tu104:1</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"> CfgTRES=cpu=12,mem=64307M,billing=12,gres/gpu=2,gres/gpu:gv100=1,gres/gpu:tu104=1</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"> AllocTRES=cpu=6,mem=32148M,gres/gpu=1,gres/gpu:gv100=1</span></p>
<br>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Slurm config follows:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">$ scontrol show conf | grep -Ei '(gres|^Sched|prio|vers)' </span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">AccountingStorageTRES = cpu,mem,energy,node,billing,gres/gpu,gres/gpu:gp100,gres/gpu:gp104,gres/gpu:gv100,gres/gpu:tu102,gres/gpu:tu104,gres/gpu:tu106</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">GresTypes = gpu</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityParameters = (null)</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityDecayHalfLife = 7-00:00:00</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityCalcPeriod = 00:05:00</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityFavorSmall = No</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityFlags = </span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityMaxAge = 7-00:00:00</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityUsageResetPeriod = NONE</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityType = priority/multifactor</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityWeightAge = 0</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityWeightFairShare = 0</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityWeightJobSize = 0</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityWeightPartition = 0</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityWeightQOS = 0</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityWeightTRES = (null)</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PropagatePrioProcess = 0</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">SchedulerParameters = default_queue_depth=2000,bf_continue,bf_ignore_newly_avail_nodes,bf_max_job_test=1000,bf_window=10080,kill_invalid_depend</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">SchedulerTimeSlice = 30 sec</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">SchedulerType = sched/backfill</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">SLURM_VERSION = 17.11.9-2</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">GPUs on node:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><span style="background-color:transparent;font-size:11pt">$ nvidia-smi --query-gpu=index,name,gpu_bus_id --format=csv</span>
</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">index, name, pci.bus_id</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">0, Tesla T4, 00000000:82:00.0</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">1, Tesla P4, 00000000:83:00.0</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">
</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The gres file on node:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">$ cat /etc/slurm/gres.conf </span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Name=gpu Type=tu104 File=/dev/nvidia0 Cores=0,1,2,3,4,5</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Name=gpu Type=gp104 File=/dev/nvidia1 Cores=6,7,8,9,10,11</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Random sample of SlurmSchedLogFile:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">$ sudo tail -3 slurm.sched.log</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:14:23.727] sched: Running job scheduler</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:14:23.728] sched: JobId=100093. State=PENDING. Reason=Resources. Priority=1. Partition=test-backfill.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:14:23.728] sched: JobId=100094. State=PENDING. Reason=Resources. Priority=1. Partition=test-backfill.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Random sample of SlurmctldLogFile:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">$ sudo grep backfill slurmctld.log | tail -5</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:16:53.281] backfill: beginning</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:16:53.281] backfill test for JobID=100093 Prio=1 Partition=test-backfill</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:16:53.281] backfill test for JobID=100094 Prio=1 Partition=test-backfill</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:16:53.281] backfill: reached end of job queue</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:16:53.281] backfill: completed testing 2(2) jobs, usec=707</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">
</span></p>
</span></div>
</blockquote>
<br>
<pre class="gmail-m_-2848306032173127196gmail-m_-3315689078268393495moz-signature" cols="72">--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
<a class="gmail-m_-2848306032173127196gmail-m_-3315689078268393495moz-txt-link-abbreviated" href="mailto:wagner@itc.rwth-aachen.de" target="_blank">wagner@itc.rwth-aachen.de</a>
<a class="gmail-m_-2848306032173127196gmail-m_-3315689078268393495moz-txt-link-abbreviated" href="http://www.itc.rwth-aachen.de" target="_blank">www.itc.rwth-aachen.de</a>
</pre>
</div>
</blockquote>
</div>
</blockquote>
<br>
<pre class="gmail-m_-2848306032173127196moz-signature" cols="72">--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
<a class="gmail-m_-2848306032173127196moz-txt-link-abbreviated" href="mailto:wagner@itc.rwth-aachen.de" target="_blank">wagner@itc.rwth-aachen.de</a>
<a class="gmail-m_-2848306032173127196moz-txt-link-abbreviated" href="http://www.itc.rwth-aachen.de" target="_blank">www.itc.rwth-aachen.de</a>
</pre>
</div>
</blockquote></div>