[slurm-users] Hi-prio jobs are bypassed by low-prio jobs

Tue May 9 11:31:29 UTC 2023

Hi,

A few tasks with higher priority give way to tasks with lower priority, 
and I don't understand why.

I noticed that the hi-prio tasks require 4 or 8 x GPUs on a single node, 
while the bypassing tasks only use 1 x GPU, but I'm not sure if it's 
related. High-priority tasks have a specific value in the StartTime 
field, but regularly, this value is pushed back to a later time. It 
seems like after finishing a 1GPU task, Slurm immediately schedules 
another 1GPU task instead of waiting for the release of the remaining 3 
or 7 GPUs for a higher-priority task. What can be wrong?

The tasks are being launched in the 'long' partition with QoS named long.

PartitionName=long
    AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
    AllocNodes=ALL Default=NO QoS=long
    DefaultTime=2-00:00:00 DisableRootJobs=NO ExclusiveUser=NO 
GraceTime=0 Hidden=NO
    MaxNodes=UNLIMITED MaxTime=10-00:00:00 MinNodes=0 LLN=NO 
MaxCPUsPerNode=UNLIMITED
    Nodes=dgx-[1-4],sr-[1-3]
    PriorityJobFactor=1 PriorityTier=10000 RootOnly=NO ReqResv=NO 
OverSubscribe=NO
    OverTimeLimit=NONE PreemptMode=SUSPEND
    State=UP TotalCPUs=656 TotalNodes=7 SelectTypeParameters=NONE
    JobDefaults=(null)
    DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
TRES=cpu=656,mem=8255731M,node=7,billing=3474,gres/gpu=32,gres/gpu:a100=32
    TRESBillingWeights=CPU=1,Mem=0.062G,GRES/gpu=72.458

   Name                     GrpTRES
------ ---------------------------
normal
   long  cpu=450,gres/gpu=28,mem=5T

Example of bypassed job with obscured sensitive data:

$ scontrol show job 649800
JobId=649800 JobName=----train with motif
    UserId=XXXXXX(XXXX) GroupId=XXXXXX(XXXXX) MCS_label=N/A
    Priority=275000 Nice=0 Account=sfglab QOS=normal
    JobState=PENDING Reason=QOSGrpGRES Dependency=(null)
    Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
    RunTime=00:00:00 TimeLimit=03:00:00 TimeMin=N/A
    SubmitTime=2023-04-24T12:09:19 EligibleTime=2023-04-24T12:09:19
    AccrueTime=2023-04-24T12:09:19
    StartTime=2023-05-11T06:30:00 EndTime=2023-05-11T09:30:00 Deadline=N/A
    SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-05-09T13:16:46 
Scheduler=Backfill:*
    Partition=long AllocNode:Sid=0.0.0.0:379113
    ReqNodeList=(null) ExcNodeList=(null)
    NodeList=
    NumNodes=1 NumCPUs=16 NumTasks=1 CPUs/Task=16 ReqB:S:C:T=0:0:*:*
    TRES=cpu=16,mem=32G,node=1,billing=16,gres/gpu=8
    Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
    MinCPUsNode=16 MinMemoryNode=32G MinTmpDiskNode=0
    Features=(null) DelayBoot=00:00:00
    OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
    Command=/XXXXX
    WorkDir=/XXXXX
    StdErr=/XXXXXX
    StdIn=/dev/null
    StdOut=/XXXXX
    Power=
    TresPerNode=gres:gpu:8

-- 
best regards | pozdrawiam serdecznie
*Michał Kadlof*
Head of the high performance computing center 	Kierownik ośrodka 
obliczeniowego HPC
Eden^N cluster administrator 	Administrator klastra obliczeniowego Eden^N
Structural and Functional Genomics Laboratory 	Laboratorium Genomiki 
Strukturalnej i Funkcjonalnej
Faculty of Mathematics and Computer Science 	Wydział Matematyki i Nauk 
Informacyjnych
Warsaw University of Technology 	Politechnika Warszawska
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230509/7b7a7fb8/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4788 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230509/7b7a7fb8/attachment.bin>