[slurm-users] How to force jobs to run next in queue

Sean Brisbane sean.brisbane at securelinx.com
Tue Mar 12 16:39:19 UTC 2019


Hi,

Thanks for your help.

Either setting qos or setting priority doesn't work for me.  However I have
found the cause if not the reason.

Using a Priority setting on the partition called "Priority" in slurm.conf
seems to force all jobs waiting on this queue to run first regardless of
any qos set on a job.  Priority is not a limit, but I think this is a bit
inconsistent with the limit hierarchy we see elsewhere and possibly even a
bug.

1. Partition QOS limit*2. Job QOS limit*
3. User association
4. Account association(s), ascending the hierarchy
5. Root/Cluster association*6. Partition limit*
7. None

So for multiple partitions with differing priorities, I can get the same
effect by moving the priority into a qos, applying a qos on the partition,
and then taking care to set OverPartQOS flag on the "boost" qos.

Does anyone have a feeling for why setting a high Priority on a partition
makes jobs run in that partition first regardless that a job in a different
Partition may have a much higher overall priority?


Sean



On Mon, 11 Mar 2019 at 17:00, Sean Brisbane <sean.brisbane at securelinx.com>
wrote:

> Hi,
>
> I'm looking to have a way an administrator can boost any job to be next to
> run when resources become available.  What is the best practice way to do
> this? Happy to try something new :-D
>
> The way I thought to do this was to have a qos with a large priority and
> manually assign this to the job.  Job 469 is the job in this example I am
> trying to elevate to be next in queue.
>
> scontrol update jobid=469 qos=boost
>
> sprio shows that this job is the highest priority by quite some way,
> however, job nbumber 492 will be next to run
>
> squeue (qxluding runnign jobs)
>              JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON)
>                469 Backgroun sleeping   centos PD       0:00      1
> (Resources)
>                492  Priority sleepy.s superuse PD       0:00      1
> (Resources)
>                448 Backgroun sleepy.s groupboo PD       0:00      1
> (Resources)
>                478 Backgroun sleepy.s groupboo PD       0:00      1
> (Resources)
>                479 Backgroun sleepy.s groupboo PD       0:00      1
> (Resources)
>                480 Backgroun sleepy.s groupboo PD       0:00      1
> (Resources)
>                481 Backgroun sleepy.s groupboo PD       0:00      1
> (Resources)
>                482 Backgroun sleepy.s groupboo PD       0:00      1
> (Resources)
>                483 Backgroun sleepy.s groupboo PD       0:00      1
> (Resources)
>                484 Backgroun sleepy.s groupboo PD       0:00      1
> (Resources)
>                449 Backgroun sleepy.s superuse PD       0:00      1
> (Resources)
>                450 Backgroun sleepy.s superuse PD       0:00      1
> (Resources)
>                465 Backgroun sleeping   centos PD       0:00      1
> (Resources)
>                466 Backgroun sleeping   centos PD       0:00      1
> (Resources)
>                467 Backgroun sleeping   centos PD       0:00      1
> (Resources)
>
>
> [root at master yp]# sprio
>           JOBID PARTITION   PRIORITY        AGE  FAIRSHARE    JOBSIZE
> PARTITION        QOS
>             448 Backgroun      13667         58        484       3125
> 10000          0
>             449 Backgroun      13205         58         23       3125
> 10000          0
>             450 Backgroun      13205         58         23       3125
> 10000          0
>             465 Backgroun      13157         32          0       3125
> 10000          0
>             466 Backgroun      13157         32          0       3125
> 10000          0
>             467 Backgroun      13157         32          0       3125
> 10000          0
>             469 Backgroun   10013157         32          0       3125
> 10000   10000000
>             478 Backgroun      13640         32        484       3125
> 10000          0
>             479 Backgroun      13640         32        484       3125
> 10000          0
>             480 Backgroun      13640         32        484       3125
> 10000          0
>             481 Backgroun      13610         32        454       3125
> 10000          0
>             482 Backgroun      13610         32        454       3125
> 10000          0
>             483 Backgroun      13610         32        454       3125
> 10000          0
>             484 Backgroun      13610         32        454       3125
> 10000          0
>             492 Priority     1003158         11         23       3125
> 1000000          0
>
>
> I'm trying to troubleshoot why the highest priority job is not next to
> run, jobs in the partition called "Priority" seem to run first.
>
>  The job 469  has no qos, partition, user accounts or group limits on the
> number of cpus,jobs,nodes etc.  I've set this test cluster up from scratch
> to be sure!
>
> [root at master yp]# scontrol show job 469
> JobId=469 JobName=sleeping.sh
>    UserId=centos(1000) GroupId=centos(1000) MCS_label=N/A
>    Priority=10013161 Nice=0 Account=default QOS=boost
>    JobState=PENDING Reason=Resources Dependency=(null)
>    Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
>    RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
>    SubmitTime=2019-03-11T16:01:20 EligibleTime=2019-03-11T16:01:20
>    StartTime=2020-03-10T15:23:40 EndTime=Unknown Deadline=N/A
>    PreemptTime=None SuspendTime=None SecsPreSuspend=0
>    LastSchedEval=2019-03-11T16:54:44
>    Partition=Background AllocNode:Sid=master:1322
>    ReqNodeList=(null) ExcNodeList=(null)
>    NodeList=(null)
>    NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>    TRES=cpu=1,node=1
>    Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
>    MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>    Features=(null) DelayBoot=00:00:00
>    Gres=(null) Reservation=(null)
>    OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
>    Command=/home/centos/sleeping.sh
>    WorkDir=/home/centos
>    StdErr=/home/centos/sleeping.sh.e469
>    StdIn=/dev/null
>    StdOut=/home/centos/sleeping.sh.o469
>    Power=
>
> The partition called "Priority" has a priority boost assigned through qos.
>
> PartitionName=Priority Nodes=compute[01-02]  Default=NO MaxTime=INFINITE
> State=UP Priority=1000 QOS=Priority
> PartitionName=Background Nodes=compute[01-02]   Default=YES
> MaxTime=INFINITE State=UP Priority=10
>
> Any Ideas would be much appreciated.
>
> Sean
>
>
>
> --
>
> --
>
> Sean Brisbane | Linux Systems Specialist
>
> Securelinx Ltd., Pottery Road, Dun Laoghaire, Co. Dublin.
> Registered in Ireland No. 357396
> www.securelinx.com <http://www.securelinx.com/> - Linux Leaders in Ireland
>


-- 

-- 

Sean Brisbane | Linux Systems Specialist
Mobile: +353(0)87 627 3024 | Office: +353 1 5065 615 (ext 610)

Securelinx Ltd., Pottery Road, Dun Laoghaire, Co. Dublin.
Registered in Ireland No. 357396
www.securelinx.com <http://www.securelinx.com/> - Linux Leaders in Ireland
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190312/38c0d73d/attachment-0001.html>


More information about the slurm-users mailing list