[slurm-users] How to force jobs to run next in queue

Mon Mar 11 17:00:36 UTC 2019

Hi,

I'm looking to have a way an administrator can boost any job to be next to
run when resources become available.  What is the best practice way to do
this? Happy to try something new :-D

The way I thought to do this was to have a qos with a large priority and
manually assign this to the job.  Job 469 is the job in this example I am
trying to elevate to be next in queue.

scontrol update jobid=469 qos=boost

sprio shows that this job is the highest priority by quite some way,
however, job nbumber 492 will be next to run

squeue (qxluding runnign jobs)
             JOBID PARTITION     NAME     USER ST       TIME  NODES
NODELIST(REASON)
               469 Backgroun sleeping   centos PD       0:00      1
(Resources)
               492  Priority sleepy.s superuse PD       0:00      1
(Resources)
               448 Backgroun sleepy.s groupboo PD       0:00      1
(Resources)
               478 Backgroun sleepy.s groupboo PD       0:00      1
(Resources)
               479 Backgroun sleepy.s groupboo PD       0:00      1
(Resources)
               480 Backgroun sleepy.s groupboo PD       0:00      1
(Resources)
               481 Backgroun sleepy.s groupboo PD       0:00      1
(Resources)
               482 Backgroun sleepy.s groupboo PD       0:00      1
(Resources)
               483 Backgroun sleepy.s groupboo PD       0:00      1
(Resources)
               484 Backgroun sleepy.s groupboo PD       0:00      1
(Resources)
               449 Backgroun sleepy.s superuse PD       0:00      1
(Resources)
               450 Backgroun sleepy.s superuse PD       0:00      1
(Resources)
               465 Backgroun sleeping   centos PD       0:00      1
(Resources)
               466 Backgroun sleeping   centos PD       0:00      1
(Resources)
               467 Backgroun sleeping   centos PD       0:00      1
(Resources)

[root at master yp]# sprio
          JOBID PARTITION   PRIORITY        AGE  FAIRSHARE    JOBSIZE
PARTITION        QOS
            448 Backgroun      13667         58        484       3125
10000          0
            449 Backgroun      13205         58         23       3125
10000          0
            450 Backgroun      13205         58         23       3125
10000          0
            465 Backgroun      13157         32          0       3125
10000          0
            466 Backgroun      13157         32          0       3125
10000          0
            467 Backgroun      13157         32          0       3125
10000          0
            469 Backgroun   10013157         32          0       3125
10000   10000000
            478 Backgroun      13640         32        484       3125
10000          0
            479 Backgroun      13640         32        484       3125
10000          0
            480 Backgroun      13640         32        484       3125
10000          0
            481 Backgroun      13610         32        454       3125
10000          0
            482 Backgroun      13610         32        454       3125
10000          0
            483 Backgroun      13610         32        454       3125
10000          0
            484 Backgroun      13610         32        454       3125
10000          0
            492 Priority     1003158         11         23       3125
1000000          0

I'm trying to troubleshoot why the highest priority job is not next to run,
jobs in the partition called "Priority" seem to run first.

 The job 469  has no qos, partition, user accounts or group limits on the
number of cpus,jobs,nodes etc.  I've set this test cluster up from scratch
to be sure!

[root at master yp]# scontrol show job 469
JobId=469 JobName=sleeping.sh
   UserId=centos(1000) GroupId=centos(1000) MCS_label=N/A
   Priority=10013161 Nice=0 Account=default QOS=boost
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2019-03-11T16:01:20 EligibleTime=2019-03-11T16:01:20
   StartTime=2020-03-10T15:23:40 EndTime=Unknown Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2019-03-11T16:54:44
   Partition=Background AllocNode:Sid=master:1322
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=(null) Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/home/centos/sleeping.sh
   WorkDir=/home/centos
   StdErr=/home/centos/sleeping.sh.e469
   StdIn=/dev/null
   StdOut=/home/centos/sleeping.sh.o469
   Power=

The partition called "Priority" has a priority boost assigned through qos.

PartitionName=Priority Nodes=compute[01-02]  Default=NO MaxTime=INFINITE
State=UP Priority=1000 QOS=Priority
PartitionName=Background Nodes=compute[01-02]   Default=YES
MaxTime=INFINITE State=UP Priority=10

Any Ideas would be much appreciated.

Sean

-- 

-- 

Sean Brisbane | Linux Systems Specialist

Securelinx Ltd., Pottery Road, Dun Laoghaire, Co. Dublin.
Registered in Ireland No. 357396
www.securelinx.com <http://www.securelinx.com/> - Linux Leaders in Ireland
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190311/35a8b0a9/attachment.html>