<div dir="ltr"><div dir="ltr">Thanks for your suggestions Marcus.  I have restarted services, and also been messing with various parameters (probably more than I should).  Nothing seems to help.</div><div dir="ltr"><br></div><div>Not ready to upgrade to Slurm 18, so guess I'll have to live with it...</div><div><br></div><div>Best,</div><div>Randy</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Apr 3, 2019 at 1:50 AM Marcus Wagner <<a href="mailto:wagner@itc.rwth-aachen.de">wagner@itc.rwth-aachen.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF">
    Hmm...,<br>
    <br>
    I'm a bit dazzled, seems to be ok as far as I can tell.<br>
    <br>
    Did you try to restart slurmctld?<br>
    I had a case, where users could not submit to the default partition
    anymore, since SLURM told them (if I remember right) <br>
    wrong account/partition combination<br>
    or something like that.<br>
    My first suspicion was my submission script since I changed it
    recently, but I could not find any error. scontrol reconfig did not
    help. <br>
    But everything went well again, after I restarted the slurmctld.<br>
    <br>
    Might be worth a try.<br>
    <br>
    <br>
    Best<br>
    Marcus<br>
    <br>
    <div class="gmail-m_-2848306032173127196moz-cite-prefix">On 4/2/19 1:24 PM, Randall Radmer
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div dir="ltr">
          <div dir="ltr">
            <div dir="ltr">
              <div dir="ltr">
                <div dir="ltr">
                  <div dir="ltr">Hi Marcus,
                    <div><br>
                    </div>
                    <div>Following jobs are running or pending after I
                      killed job 100816, which was running on
                      computelab-134's T4:</div>
                    <div>
                      <div>100815 RUNNING computelab-134 gpu:gv100:1
                        None1</div>
                      <div>100817 PENDING gpu:gv100:1 Resources1</div>
                      <div>100818 PENDING gpu:tu104:1 Resources1</div>
                    </div>
                    <div><br>
                    </div>
                    <div>
                      <div>$ scontrol -d show node computelab-134</div>
                      <div>NodeName=computelab-134 Arch=x86_64
                        CoresPerSocket=6</div>
                      <div>   CPUAlloc=6 CPUErr=0 CPUTot=12 CPULoad=0.00</div>
                      <div>   AvailableFeatures=(null)</div>
                      <div>   ActiveFeatures=(null)</div>
                      <div>   Gres=gpu:gv100:1,gpu:tu104:1</div>
                      <div>   GresDrain=N/A</div>
                      <div> 
                         GresUsed=gpu:gv100:1(IDX:0),gpu:tu104:0(IDX:N/A)</div>
                      <div>   NodeAddr=computelab-134
                        NodeHostName=computelab-134 Version=17.11</div>
                      <div>   OS=Linux 4.4.0-143-generic #169-Ubuntu SMP
                        Thu Feb 7 07:56:38 UTC 2019 </div>
                      <div>   RealMemory=64307 AllocMem=32148
                        FreeMem=61126 Sockets=2 Boards=1</div>
                      <div>   State=MIXED ThreadsPerCore=1
                        TmpDisk=404938 Weight=1 Owner=N/A MCS_label=N/A</div>
                      <div>   Partitions=test-backfill </div>
                      <div>   BootTime=2019-03-29T12:09:25
                        SlurmdStartTime=2019-04-01T11:34:35</div>
                      <div> 
 CfgTRES=cpu=12,mem=64307M,billing=12,gres/gpu=2,gres/gpu:gv100=1,gres/gpu:tu104=1</div>
                      <div> 
                         AllocTRES=cpu=6,mem=32148M,gres/gpu=1,gres/gpu:gv100=1</div>
                      <div>   CapWatts=n/a</div>
                      <div>   CurrentWatts=0 LowestJoules=0
                        ConsumedJoules=0</div>
                      <div>   ExtSensorsJoules=n/s ExtSensorsWatts=0
                        ExtSensorsTemp=n/s</div>
                    </div>
                    <div><br>
                    </div>
                    <div>
                      <div>$ scontrol -d show job 100815</div>
                      <div>JobId=100815 JobName=bash</div>
                      <div>   UserId=rradmer(27578) GroupId=hardware(30)
                        MCS_label=N/A</div>
                      <div>   Priority=1 Nice=0 Account=cag QOS=normal</div>
                      <div>   JobState=RUNNING Reason=None
                        Dependency=(null)</div>
                      <div>   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0
                        ExitCode=0:0</div>
                      <div>   DerivedExitCode=0:0</div>
                      <div>   RunTime=00:06:45 TimeLimit=02:00:00
                        TimeMin=N/A</div>
                      <div>   SubmitTime=2019-04-02T05:13:05
                        EligibleTime=2019-04-02T05:13:05</div>
                      <div>   StartTime=2019-04-02T05:13:05
                        EndTime=2019-04-02T07:13:05 Deadline=N/A</div>
                      <div>   PreemptTime=None SuspendTime=None
                        SecsPreSuspend=0</div>
                      <div>   LastSchedEval=2019-04-02T05:13:05</div>
                      <div>   Partition=test-backfill
                        AllocNode:Sid=computelab-frontend-02:7873</div>
                      <div>   ReqNodeList=computelab-134
                        ExcNodeList=(null)</div>
                      <div>   NodeList=computelab-134</div>
                      <div>   BatchHost=computelab-134</div>
                      <div>   NumNodes=1 NumCPUs=6 NumTasks=1
                        CPUs/Task=6 ReqB:S:C:T=0:0:*:*</div>
                      <div> 
                         TRES=cpu=6,mem=32148M,node=1,billing=6,gres/gpu=1,gres/gpu:gv100=1</div>
                      <div>   Socks/Node=* NtasksPerN:B:S:C=0:0:*:*
                        CoreSpec=*</div>
                      <div>     Nodes=computelab-134 CPU_IDs=0-5
                        Mem=32148 GRES_IDX=gpu:gv100(IDX:0)</div>
                      <div>   MinCPUsNode=6 MinMemoryNode=32148M
                        MinTmpDiskNode=0</div>
                      <div>   Features=(null) DelayBoot=00:00:00</div>
                      <div>   Gres=gpu:gv100:1 Reservation=(null)</div>
                      <div>   OverSubscribe=OK Contiguous=0
                        Licenses=(null) Network=(null)</div>
                      <div>   Command=/bin/bash</div>
                      <div>   WorkDir=/home/rradmer</div>
                      <div>   Power=</div>
                    </div>
                    <div><br>
                    </div>
                    <div>
                      <div>$ scontrol -d show job 100817</div>
                      <div>JobId=100817 JobName=bash</div>
                      <div>   UserId=rradmer(27578) GroupId=hardware(30)
                        MCS_label=N/A</div>
                      <div>   Priority=1 Nice=0 Account=cag QOS=normal</div>
                      <div>   JobState=PENDING Reason=Resources
                        Dependency=(null)</div>
                      <div>   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0
                        ExitCode=0:0</div>
                      <div>   DerivedExitCode=0:0</div>
                      <div>   RunTime=00:00:00 TimeLimit=02:00:00
                        TimeMin=N/A</div>
                      <div>   SubmitTime=2019-04-02T05:13:11
                        EligibleTime=2019-04-02T05:13:11</div>
                      <div>   StartTime=2019-04-02T07:13:05
                        EndTime=2019-04-02T09:13:05 Deadline=N/A</div>
                      <div>   PreemptTime=None SuspendTime=None
                        SecsPreSuspend=0</div>
                      <div>   LastSchedEval=2019-04-02T05:20:44</div>
                      <div>   Partition=test-backfill
                        AllocNode:Sid=computelab-frontend-03:21736</div>
                      <div>   ReqNodeList=computelab-134
                        ExcNodeList=(null)</div>
                      <div>   NodeList=(null)
                        SchedNodeList=computelab-134</div>
                      <div>   NumNodes=1-1 NumCPUs=6 NumTasks=1
                        CPUs/Task=6 ReqB:S:C:T=0:0:*:*</div>
                      <div> 
                         TRES=cpu=6,mem=32148M,node=1,gres/gpu=1,gres/gpu:gv100=1</div>
                      <div>   Socks/Node=* NtasksPerN:B:S:C=0:0:*:*
                        CoreSpec=*</div>
                      <div>   MinCPUsNode=6 MinMemoryNode=32148M
                        MinTmpDiskNode=0</div>
                      <div>   Features=(null) DelayBoot=00:00:00</div>
                      <div>   Gres=gpu:gv100:1 Reservation=(null)</div>
                      <div>   OverSubscribe=OK Contiguous=0
                        Licenses=(null) Network=(null)</div>
                      <div>   Command=/bin/bash</div>
                      <div>   WorkDir=/home/rradmer</div>
                      <div>   Power=</div>
                    </div>
                    <div><br>
                    </div>
                    <div>
                      <div>$ scontrol -d show job 100818</div>
                      <div>JobId=100818 JobName=bash</div>
                      <div>   UserId=rradmer(27578) GroupId=hardware(30)
                        MCS_label=N/A</div>
                      <div>   Priority=1 Nice=0 Account=cag QOS=normal</div>
                      <div>   JobState=PENDING Reason=Resources
                        Dependency=(null)</div>
                      <div>   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0
                        ExitCode=0:0</div>
                      <div>   DerivedExitCode=0:0</div>
                      <div>   RunTime=00:00:00 TimeLimit=02:00:00
                        TimeMin=N/A</div>
                      <div>   SubmitTime=2019-04-02T05:13:12
                        EligibleTime=2019-04-02T05:13:12</div>
                      <div>   StartTime=2019-04-02T09:13:00
                        EndTime=2019-04-02T11:13:00 Deadline=N/A</div>
                      <div>   PreemptTime=None SuspendTime=None
                        SecsPreSuspend=0</div>
                      <div>   LastSchedEval=2019-04-02T05:21:32</div>
                      <div>   Partition=test-backfill
                        AllocNode:Sid=computelab-frontend-02:12826</div>
                      <div>   ReqNodeList=computelab-134
                        ExcNodeList=(null)</div>
                      <div>   NodeList=(null)
                        SchedNodeList=computelab-134</div>
                      <div>   NumNodes=1-1 NumCPUs=6 NumTasks=1
                        CPUs/Task=6 ReqB:S:C:T=0:0:*:*</div>
                      <div> 
                         TRES=cpu=6,mem=32148M,node=1,gres/gpu=1,gres/gpu:tu104=1</div>
                      <div>   Socks/Node=* NtasksPerN:B:S:C=0:0:*:*
                        CoreSpec=*</div>
                      <div>   MinCPUsNode=6 MinMemoryNode=32148M
                        MinTmpDiskNode=0</div>
                      <div>   Features=(null) DelayBoot=00:00:00</div>
                      <div>   Gres=gpu:tu104:1 Reservation=(null)</div>
                      <div>   OverSubscribe=OK Contiguous=0
                        Licenses=(null) Network=(null)</div>
                      <div>   Command=/bin/bash</div>
                      <div>   WorkDir=/home/rradmer</div>
                      <div>   Power=</div>
                    </div>
                    <div><br>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Mon, Apr 1, 2019 at 11:24
          PM Marcus Wagner <<a href="mailto:wagner@itc.rwth-aachen.de" target="_blank">wagner@itc.rwth-aachen.de</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div bgcolor="#FFFFFF"> Dear Randall,<br>
            <br>
            could you please also provide<br>
            <br>
            <br>
            scontrol -d show node computelab-134<br>
            scontrol -d show job 100091<br>
            scontrol -d show job 100094<br>
            <br>
            <br>
            Best<br>
            Marcus<br>
            <br>
            <div class="gmail-m_-2848306032173127196gmail-m_-3315689078268393495moz-cite-prefix">On
              4/1/19 4:31 PM, Randall Radmer wrote:<br>
            </div>
            <blockquote type="cite">
              <div dir="ltr"><span id="gmail-m_-2848306032173127196gmail-m_-3315689078268393495gmail-docs-internal-guid-cc4ba6cd-7fff-44e2-be99-bccade6b624b">
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">I can’t get backfill to work for a machine with two GPUs (one is a P4 and the other a T4).</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Submitting jobs works as expected: if the GPU I request is free, then my job runs, otherwise it goes into a pending state.  But if I have pending jobs for one GPU ahead of pending jobs for the other GPU, I see blocking issues.</span></p>
                  <br>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">More specifically, I can create a case where I am running a job on each of the GPUs and have a pending job waiting for the P4 followed by a pending job waiting for a T4.  I would expect that if I exit the running T4 job, then backfill would start the pending T4 job, even though it has to job ahead of the pending P4 job.  This does not happen...</span></p>
                  <br>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The following shows my jobs after I exited from a running T4 job, which had ID 100092:</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">$ squeue --noheader -u rradmer --Format=jobid,state,gres,nodelist,reason | sed 's/  */ /g' | sort</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">100091 RUNNING gpu:gv100:1 computelab-134 None </span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">100093 PENDING gpu:gv100:1 Resources </span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">100094 PENDING gpu:tu104:1 Resources </span></p>
                  <br>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">I can find no reason why 100094  doesn’t start running (I’ve waited up to an hour, just to make sure).</span></p>
                  <br>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">System config info and log snippets shown below.</span></p>
                  <br>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Thanks much,</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Randy</span></p>
                  <br>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Node state corresponding to the squeue command, shown above:</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">$ scontrol show node computelab-134 | grep -i [gt]res</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">   Gres=gpu:gv100:1,gpu:tu104:1</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">   CfgTRES=cpu=12,mem=64307M,billing=12,gres/gpu=2,gres/gpu:gv100=1,gres/gpu:tu104=1</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">   AllocTRES=cpu=6,mem=32148M,gres/gpu=1,gres/gpu:gv100=1</span></p>
                  <br>
                  <br>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Slurm config follows:</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">$ scontrol show conf | grep -Ei '(gres|^Sched|prio|vers)' </span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">AccountingStorageTRES = cpu,mem,energy,node,billing,gres/gpu,gres/gpu:gp100,gres/gpu:gp104,gres/gpu:gv100,gres/gpu:tu102,gres/gpu:tu104,gres/gpu:tu106</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">GresTypes               = gpu</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityParameters      = (null)</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityDecayHalfLife   = 7-00:00:00</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityCalcPeriod      = 00:05:00</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityFavorSmall      = No</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityFlags           = </span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityMaxAge          = 7-00:00:00</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityUsageResetPeriod = NONE</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityType            = priority/multifactor</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityWeightAge       = 0</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityWeightFairShare = 0</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityWeightJobSize   = 0</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityWeightPartition = 0</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityWeightQOS       = 0</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PriorityWeightTRES      = (null)</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">PropagatePrioProcess    = 0</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">SchedulerParameters     = default_queue_depth=2000,bf_continue,bf_ignore_newly_avail_nodes,bf_max_job_test=1000,bf_window=10080,kill_invalid_depend</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">SchedulerTimeSlice      = 30 sec</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">SchedulerType           = sched/backfill</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">SLURM_VERSION           = 17.11.9-2</span></p>
                  <br>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">GPUs on node:</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><span style="background-color:transparent;font-size:11pt">$ nvidia-smi --query-gpu=index,name,gpu_bus_id --format=csv</span>
</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">index, name, pci.bus_id</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">0, Tesla T4, 00000000:82:00.0</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">1, Tesla P4, 00000000:83:00.0</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">
</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The gres file on node:</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">$ cat /etc/slurm/gres.conf </span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Name=gpu Type=tu104 File=/dev/nvidia0 Cores=0,1,2,3,4,5</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Name=gpu Type=gp104 File=/dev/nvidia1 Cores=6,7,8,9,10,11</span></p>
                  <br>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Random sample of SlurmSchedLogFile:</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">$ sudo tail -3 slurm.sched.log</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:14:23.727] sched: Running job scheduler</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:14:23.728] sched: JobId=100093. State=PENDING. Reason=Resources. Priority=1. Partition=test-backfill.</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:14:23.728] sched: JobId=100094. State=PENDING. Reason=Resources. Priority=1. Partition=test-backfill.</span></p>
                  <br>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Random sample of SlurmctldLogFile:</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">$ sudo grep backfill slurmctld.log  | tail -5</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:16:53.281] backfill: beginning</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:16:53.281] backfill test for JobID=100093 Prio=1 Partition=test-backfill</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:16:53.281] backfill test for JobID=100094 Prio=1 Partition=test-backfill</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:16:53.281] backfill: reached end of job queue</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2019-04-01T08:16:53.281] backfill: completed testing 2(2) jobs, usec=707</span></p>
                  <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">
</span></p>
                </span></div>
            </blockquote>
            <br>
            <pre class="gmail-m_-2848306032173127196gmail-m_-3315689078268393495moz-signature" cols="72">-- 
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
<a class="gmail-m_-2848306032173127196gmail-m_-3315689078268393495moz-txt-link-abbreviated" href="mailto:wagner@itc.rwth-aachen.de" target="_blank">wagner@itc.rwth-aachen.de</a>
<a class="gmail-m_-2848306032173127196gmail-m_-3315689078268393495moz-txt-link-abbreviated" href="http://www.itc.rwth-aachen.de" target="_blank">www.itc.rwth-aachen.de</a>
</pre>
          </div>
        </blockquote>
      </div>
    </blockquote>
    <br>
    <pre class="gmail-m_-2848306032173127196moz-signature" cols="72">-- 
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
<a class="gmail-m_-2848306032173127196moz-txt-link-abbreviated" href="mailto:wagner@itc.rwth-aachen.de" target="_blank">wagner@itc.rwth-aachen.de</a>
<a class="gmail-m_-2848306032173127196moz-txt-link-abbreviated" href="http://www.itc.rwth-aachen.de" target="_blank">www.itc.rwth-aachen.de</a>
</pre>
  </div>

</blockquote></div>