[slurm-users] How to force jobs to run next in queue

Thomas M. Payerle payerle at umd.edu
Tue Mar 12 16:51:02 UTC 2019


Are you uising the prioirty/multifactor plugin?  What are the values of the
various Priority* weight factors?

On Tue, Mar 12, 2019 at 12:42 PM Sean Brisbane <sean.brisbane at securelinx.com>
wrote:

> Hi,
>
> Thanks for your help.
>
> Either setting qos or setting priority doesn't work for me.  However I
> have found the cause if not the reason.
>
> Using a Priority setting on the partition called "Priority" in slurm.conf
> seems to force all jobs waiting on this queue to run first regardless of
> any qos set on a job.  Priority is not a limit, but I think this is a bit
> inconsistent with the limit hierarchy we see elsewhere and possibly even a
> bug.
>
> 1. Partition QOS limit*2. Job QOS limit*
> 3. User association
> 4. Account association(s), ascending the hierarchy
> 5. Root/Cluster association*6. Partition limit*
> 7. None
>
> So for multiple partitions with differing priorities, I can get the same
> effect by moving the priority into a qos, applying a qos on the partition,
> and then taking care to set OverPartQOS flag on the "boost" qos.
>
> Does anyone have a feeling for why setting a high Priority on a partition
> makes jobs run in that partition first regardless that a job in a different
> Partition may have a much higher overall priority?
>
>
> Sean
>
>
>
> On Mon, 11 Mar 2019 at 17:00, Sean Brisbane <sean.brisbane at securelinx.com>
> wrote:
>
>> Hi,
>>
>> I'm looking to have a way an administrator can boost any job to be next
>> to run when resources become available.  What is the best practice way to
>> do this? Happy to try something new :-D
>>
>> The way I thought to do this was to have a qos with a large priority and
>> manually assign this to the job.  Job 469 is the job in this example I am
>> trying to elevate to be next in queue.
>>
>> scontrol update jobid=469 qos=boost
>>
>> sprio shows that this job is the highest priority by quite some way,
>> however, job nbumber 492 will be next to run
>>
>> squeue (qxluding runnign jobs)
>>              JOBID PARTITION     NAME     USER ST       TIME  NODES
>> NODELIST(REASON)
>>                469 Backgroun sleeping   centos PD       0:00      1
>> (Resources)
>>                492  Priority sleepy.s superuse PD       0:00      1
>> (Resources)
>>                448 Backgroun sleepy.s groupboo PD       0:00      1
>> (Resources)
>>                478 Backgroun sleepy.s groupboo PD       0:00      1
>> (Resources)
>>                479 Backgroun sleepy.s groupboo PD       0:00      1
>> (Resources)
>>                480 Backgroun sleepy.s groupboo PD       0:00      1
>> (Resources)
>>                481 Backgroun sleepy.s groupboo PD       0:00      1
>> (Resources)
>>                482 Backgroun sleepy.s groupboo PD       0:00      1
>> (Resources)
>>                483 Backgroun sleepy.s groupboo PD       0:00      1
>> (Resources)
>>                484 Backgroun sleepy.s groupboo PD       0:00      1
>> (Resources)
>>                449 Backgroun sleepy.s superuse PD       0:00      1
>> (Resources)
>>                450 Backgroun sleepy.s superuse PD       0:00      1
>> (Resources)
>>                465 Backgroun sleeping   centos PD       0:00      1
>> (Resources)
>>                466 Backgroun sleeping   centos PD       0:00      1
>> (Resources)
>>                467 Backgroun sleeping   centos PD       0:00      1
>> (Resources)
>>
>>
>> [root at master yp]# sprio
>>           JOBID PARTITION   PRIORITY        AGE  FAIRSHARE    JOBSIZE
>> PARTITION        QOS
>>             448 Backgroun      13667         58        484       3125
>>   10000          0
>>             449 Backgroun      13205         58         23       3125
>>   10000          0
>>             450 Backgroun      13205         58         23       3125
>>   10000          0
>>             465 Backgroun      13157         32          0       3125
>>   10000          0
>>             466 Backgroun      13157         32          0       3125
>>   10000          0
>>             467 Backgroun      13157         32          0       3125
>>   10000          0
>>             469 Backgroun   10013157         32          0       3125
>>   10000   10000000
>>             478 Backgroun      13640         32        484       3125
>>   10000          0
>>             479 Backgroun      13640         32        484       3125
>>   10000          0
>>             480 Backgroun      13640         32        484       3125
>>   10000          0
>>             481 Backgroun      13610         32        454       3125
>>   10000          0
>>             482 Backgroun      13610         32        454       3125
>>   10000          0
>>             483 Backgroun      13610         32        454       3125
>>   10000          0
>>             484 Backgroun      13610         32        454       3125
>>   10000          0
>>             492 Priority     1003158         11         23       3125
>> 1000000          0
>>
>>
>> I'm trying to troubleshoot why the highest priority job is not next to
>> run, jobs in the partition called "Priority" seem to run first.
>>
>>  The job 469  has no qos, partition, user accounts or group limits on the
>> number of cpus,jobs,nodes etc.  I've set this test cluster up from scratch
>> to be sure!
>>
>> [root at master yp]# scontrol show job 469
>> JobId=469 JobName=sleeping.sh
>>    UserId=centos(1000) GroupId=centos(1000) MCS_label=N/A
>>    Priority=10013161 Nice=0 Account=default QOS=boost
>>    JobState=PENDING Reason=Resources Dependency=(null)
>>    Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
>>    RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
>>    SubmitTime=2019-03-11T16:01:20 EligibleTime=2019-03-11T16:01:20
>>    StartTime=2020-03-10T15:23:40 EndTime=Unknown Deadline=N/A
>>    PreemptTime=None SuspendTime=None SecsPreSuspend=0
>>    LastSchedEval=2019-03-11T16:54:44
>>    Partition=Background AllocNode:Sid=master:1322
>>    ReqNodeList=(null) ExcNodeList=(null)
>>    NodeList=(null)
>>    NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>>    TRES=cpu=1,node=1
>>    Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
>>    MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>>    Features=(null) DelayBoot=00:00:00
>>    Gres=(null) Reservation=(null)
>>    OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
>>    Command=/home/centos/sleeping.sh
>>    WorkDir=/home/centos
>>    StdErr=/home/centos/sleeping.sh.e469
>>    StdIn=/dev/null
>>    StdOut=/home/centos/sleeping.sh.o469
>>    Power=
>>
>> The partition called "Priority" has a priority boost assigned through qos.
>>
>> PartitionName=Priority Nodes=compute[01-02]  Default=NO MaxTime=INFINITE
>> State=UP Priority=1000 QOS=Priority
>> PartitionName=Background Nodes=compute[01-02]   Default=YES
>> MaxTime=INFINITE State=UP Priority=10
>>
>> Any Ideas would be much appreciated.
>>
>> Sean
>>
>>
>>
>> --
>>
>> --
>>
>> Sean Brisbane | Linux Systems Specialist
>>
>> Securelinx Ltd., Pottery Road, Dun Laoghaire, Co. Dublin.
>> Registered in Ireland No. 357396
>> www.securelinx.com <http://www.securelinx.com/> - Linux Leaders in
>> Ireland
>>
>
>
> --
>
> --
>
> Sean Brisbane | Linux Systems Specialist
> Mobile: +353(0)87 627 3024 | Office: +353 1 5065 615 (ext 610)
>
> Securelinx Ltd., Pottery Road, Dun Laoghaire, Co. Dublin.
> Registered in Ireland No. 357396
> www.securelinx.com <http://www.securelinx.com/> - Linux Leaders in Ireland
>


-- 
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads        payerle at umd.edu
5825 University Research Park               (301) 405-6135
University of Maryland
College Park, MD 20740-3831
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190312/6ba3b24a/attachment-0001.html>


More information about the slurm-users mailing list