<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="flex flex-grow flex-col gap-3">
      <div class="min-h-[20px] flex flex-col items-start gap-4
        whitespace-pre-wrap break-words">
        <div class="markdown prose w-full break-words dark:prose-invert
          light">
          <p>Hi,<br>
            <br>
            A few tasks with higher priority give way to tasks with
            lower priority, and I don't understand why.</p>
          <p>I noticed that the hi-prio tasks require 4 or 8 x GPUs on a
            single node, while the bypassing tasks only use 1 x GPU, but
            I'm not sure if it's related.
            High-priority tasks have a specific value in the StartTime
            field, but regularly, this value is pushed back to a later
            time.
            It seems like after finishing a 1GPU task, Slurm immediately
            schedules another 1GPU task instead of waiting for the
            release of the remaining 3 or 7 GPUs for a higher-priority
            task. What can be wrong?</p>
          <p>The tasks are being launched in the 'long' partition with
            QoS named long.</p>
          <p><font face="monospace">PartitionName=long<br>
                 AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL<br>
                 AllocNodes=ALL Default=NO QoS=long<br>
                 DefaultTime=2-00:00:00 DisableRootJobs=NO
              ExclusiveUser=NO GraceTime=0 Hidden=NO<br>
                 MaxNodes=UNLIMITED MaxTime=10-00:00:00 MinNodes=0
              LLN=NO MaxCPUsPerNode=UNLIMITED<br>
                 Nodes=dgx-[1-4],sr-[1-3]<br>
                 PriorityJobFactor=1 PriorityTier=10000 RootOnly=NO
              ReqResv=NO OverSubscribe=NO<br>
                 OverTimeLimit=NONE PreemptMode=SUSPEND<br>
                 State=UP TotalCPUs=656 TotalNodes=7
              SelectTypeParameters=NONE<br>
                 JobDefaults=(null)<br>
                 DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED<br>
                
TRES=cpu=656,mem=8255731M,node=7,billing=3474,gres/gpu=32,gres/gpu:a100=32<br>
                 TRESBillingWeights=CPU=1,Mem=0.062G,GRES/gpu=72.458</font></p>
          <p><font face="monospace">  Name                     GrpTRES <br>
              ------ --------------------------- <br>
              normal      <br>
                long  cpu=450,gres/gpu=28,mem=5T</font></p>
          <p><br>
          </p>
          <p>Example of bypassed job with obscured sensitive data:<font
              face="monospace"><br>
              <br>
              $ scontrol show job 649800<br>
              JobId=649800 JobName=----train with motif <br>
                 UserId=XXXXXX(XXXX) GroupId=XXXXXX(XXXXX) MCS_label=N/A<br>
                 Priority=275000 Nice=0 Account=sfglab QOS=normal<br>
                 JobState=PENDING Reason=QOSGrpGRES Dependency=(null)<br>
                 Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0<br>
                 RunTime=00:00:00 TimeLimit=03:00:00 TimeMin=N/A<br>
                 SubmitTime=2023-04-24T12:09:19
              EligibleTime=2023-04-24T12:09:19<br>
                 AccrueTime=2023-04-24T12:09:19<br>
                 StartTime=2023-05-11T06:30:00
              EndTime=2023-05-11T09:30:00 Deadline=N/A<br>
                 SuspendTime=None SecsPreSuspend=0
              LastSchedEval=2023-05-09T13:16:46 Scheduler=Backfill:*<br>
                 Partition=long AllocNode:Sid=0.0.0.0:379113<br>
                 ReqNodeList=(null) ExcNodeList=(null)<br>
                 NodeList=<br>
                 NumNodes=1 NumCPUs=16 NumTasks=1 CPUs/Task=16
              ReqB:S:C:T=0:0:*:*<br>
                 TRES=cpu=16,mem=32G,node=1,billing=16,gres/gpu=8<br>
                 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*<br>
                 MinCPUsNode=16 MinMemoryNode=32G MinTmpDiskNode=0<br>
                 Features=(null) DelayBoot=00:00:00<br>
                 OverSubscribe=OK Contiguous=0 Licenses=(null)
              Network=(null)<br>
                 Command=/XXXXX<br>
                 WorkDir=/XXXXX<br>
                 StdErr=/XXXXXX<br>
                 StdIn=/dev/null<br>
                 StdOut=/XXXXX<br>
                 Power=<br>
                 TresPerNode=gres:gpu:8</font></p>
        </div>
      </div>
    </div>
    <div class="moz-signature">-- <br>
      best regards | pozdrawiam serdecznie<br>
      <b>Michał Kadlof</b><br>
      <table style="font-size:9pt;border: 1px solid
        transparent;padding:0 10px; border-collapse: collapse;">
        <tbody>
          <tr>
            <td style="font-style: italic;border: 1px solid
              transparent;padding:0 10px;">Head of the high performance
              computing center</td>
            <td style="font-style: italic;border: 1px solid
              transparent;padding:0 10px;">Kierownik ośrodka
              obliczeniowego HPC</td>
          </tr>
          <tr>
            <td style="font-style: italic;border: 1px solid
              transparent;padding:0 10px;">Eden<sup>N</sup> cluster
              administrator</td>
            <td style="font-style: italic;border: 1px solid
              transparent;padding:0 10px;">Administrator klastra
              obliczeniowego Eden<sup>N</sup></td>
          </tr>
          <tr>
            <td style="border: 1px solid transparent;padding:0
              10px;opacity:0.5;">Structural and Functional Genomics
              Laboratory</td>
            <td style="border: 1px solid transparent;padding:0
              10px;opacity:0.5;">Laboratorium Genomiki Strukturalnej i
              Funkcjonalnej</td>
          </tr>
          <tr>
            <td style="border: 1px solid transparent;padding:0
              10px;opacity:0.5;">Faculty of Mathematics and Computer
              Science</td>
            <td style="border: 1px solid transparent;padding:0
              10px;opacity:0.5;">Wydział Matematyki i Nauk
              Informacyjnych</td>
          </tr>
          <tr>
            <td style="border: 1px solid transparent;padding:0
              10px;opacity:0.5;">Warsaw University of Technology</td>
            <td style="border: 1px solid transparent;padding:0
              10px;opacity:0.5;">Politechnika Warszawska</td>
          </tr>
        </tbody>
      </table>
    </div>
  </body>
</html>