[slurm-users] Preemption not working for jobs in higher priority partition

Brian Andrus toomuchit at gmail.com
Fri Aug 20 13:38:14 UTC 2021


IIRC, Preemption is determined by partition first, not node.

Since your pending job is in the 'day' partition, it will not preempt 
something in the 'night' partition (even if the node is in both).

Brian Andrus

On 8/19/2021 2:49 PM, Russell Jones wrote:
> Hi all,
>
> I could use some help to understand why preemption is not working for 
> me properly. I have a job blocking other jobs that doesn't make sense 
> to me. Any assistance is appreciated, thank you!
>
>
> I have two partitions defined in slurm, a day time and a night time 
> pariition:
>
>     Day partition - PriorityTier of 5, always Up. Limited resources
>     under this QOS.
>     Night partition - PriorityTier of 5 during night time, during day
>     time set to Down and PriorityTier changed to 1. Jobs can be
>     submitted to night queue for an unlimited QOS as long as resources
>     are available.
>
>     The thought here is jobs can continue to run in the night
>     partition, even during the day time, until resources are requested
>     from the day partition. Jobs would then be requeued/canceled in
>     the night partition to satisfy those requirements.
>
>
>
> Current output of "scontrol show part" :
>
>     PartitionName=day
>        AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
>        AllocNodes=ALL Default=NO QoS=part_day
>        DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO
>     GraceTime=0 Hidden=NO
>        MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=0 LLN=NO
>     MaxCPUsPerNode=UNLIMITED
>        Nodes=cluster-r1n[01-13],cluster-r2n[01-08]
>        PriorityJobFactor=1 PriorityTier=5 RootOnly=NO ReqResv=NO
>     OverSubscribe=NO
>        OverTimeLimit=NONE PreemptMode=REQUEUE
>        State=UP TotalCPUs=336 TotalNodes=21 SelectTypeParameters=NONE
>        JobDefaults=(null)
>        DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
>
>
>     PartitionName=night
>        AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
>        AllocNodes=ALL Default=NO QoS=part_night
>        DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO
>     GraceTime=0 Hidden=NO
>        MaxNodes=22 MaxTime=7-00:00:00 MinNodes=0 LLN=NO
>     MaxCPUsPerNode=UNLIMITED
>        Nodes=cluster-r1n[01-13],cluster-r2n[01-08]
>        PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
>     OverSubscribe=NO
>        OverTimeLimit=NONE PreemptMode=REQUEUE
>        State=DOWN TotalCPUs=336 TotalNodes=21 SelectTypeParameters=NONE
>        JobDefaults=(null)
>        DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
>
>
>
>
> I currently have a job in the night partition that is blocking jobs in 
> the day partition, even though the day partition has a PriorityTier of 
> 5, and night partition is Down with a PriorityTier of 1.
>
> My current slurm.conf preemption settings are:
>
>     PreemptMode=REQUEUE
>     PreemptType=preempt/partition_prio
>
>
>
> The blocking job's scontrol show job output is:
>
>     JobId=105713 JobName=jobname
>        Priority=1986 Nice=0 Account=xxx QOS=normal
>        JobState=RUNNING Reason=None Dependency=(null)
>        Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
>        RunTime=17:49:39 TimeLimit=7-00:00:00 TimeMin=N/A
>        SubmitTime=2021-08-18T22:36:36 EligibleTime=2021-08-18T22:36:36
>        AccrueTime=2021-08-18T22:36:36
>        StartTime=2021-08-18T22:36:39 EndTime=2021-08-25T22:36:39
>     Deadline=N/A
>        PreemptEligibleTime=2021-08-18T22:36:39 PreemptTime=None
>        SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-08-18T22:36:39
>        Partition=night AllocNode:Sid=cluster-1:1341505
>        ReqNodeList=(null) ExcNodeList=(null)
>        NodeList=cluster-r1n[12-13],cluster-r2n[04-06]
>        BatchHost=cluster-r1n12
>        NumNodes=5 NumCPUs=80 NumTasks=5 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>        TRES=cpu=80,node=5,billing=80,gres/gpu=20
>        Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
>        MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>        Features=(null) DelayBoot=00:00:00
>        OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
>
>
>
> The job that is being blocked:
>
>     JobId=105876 JobName=bash
>        Priority=2103 Nice=0 Account=xxx QOS=normal
>        JobState=PENDING
>     Reason=Nodes_required_for_job_are_DOWN,_DRAINED_or_reserved_for_jobs_in_higher_priority_partitions
>     Dependency=(null)
>        Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
>        RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A
>        SubmitTime=2021-08-19T16:19:23 EligibleTime=2021-08-19T16:19:23
>        AccrueTime=2021-08-19T16:19:23
>        StartTime=Unknown EndTime=Unknown Deadline=N/A
>        SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-08-19T16:26:43
>        Partition=day AllocNode:Sid=cluster-1:2776451
>        ReqNodeList=(null) ExcNodeList=(null)
>        NodeList=(null)
>        NumNodes=3 NumCPUs=40 NumTasks=40 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>        TRES=cpu=40,node=1,billing=40
>        Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
>        MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>        Features=(null) DelayBoot=00:00:00
>        OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
>
>
>
> Why is the day job not preempting the night job?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210820/5ab193c6/attachment.htm>


More information about the slurm-users mailing list