[slurm-users] How to force jobs to run next in queue
Thomas M. Payerle
payerle at umd.edu
Tue Mar 12 16:51:02 UTC 2019
Are you uising the prioirty/multifactor plugin? What are the values of the
various Priority* weight factors?
On Tue, Mar 12, 2019 at 12:42 PM Sean Brisbane <sean.brisbane at securelinx.com>
wrote:
> Hi,
>
> Thanks for your help.
>
> Either setting qos or setting priority doesn't work for me. However I
> have found the cause if not the reason.
>
> Using a Priority setting on the partition called "Priority" in slurm.conf
> seems to force all jobs waiting on this queue to run first regardless of
> any qos set on a job. Priority is not a limit, but I think this is a bit
> inconsistent with the limit hierarchy we see elsewhere and possibly even a
> bug.
>
> 1. Partition QOS limit*2. Job QOS limit*
> 3. User association
> 4. Account association(s), ascending the hierarchy
> 5. Root/Cluster association*6. Partition limit*
> 7. None
>
> So for multiple partitions with differing priorities, I can get the same
> effect by moving the priority into a qos, applying a qos on the partition,
> and then taking care to set OverPartQOS flag on the "boost" qos.
>
> Does anyone have a feeling for why setting a high Priority on a partition
> makes jobs run in that partition first regardless that a job in a different
> Partition may have a much higher overall priority?
>
>
> Sean
>
>
>
> On Mon, 11 Mar 2019 at 17:00, Sean Brisbane <sean.brisbane at securelinx.com>
> wrote:
>
>> Hi,
>>
>> I'm looking to have a way an administrator can boost any job to be next
>> to run when resources become available. What is the best practice way to
>> do this? Happy to try something new :-D
>>
>> The way I thought to do this was to have a qos with a large priority and
>> manually assign this to the job. Job 469 is the job in this example I am
>> trying to elevate to be next in queue.
>>
>> scontrol update jobid=469 qos=boost
>>
>> sprio shows that this job is the highest priority by quite some way,
>> however, job nbumber 492 will be next to run
>>
>> squeue (qxluding runnign jobs)
>> JOBID PARTITION NAME USER ST TIME NODES
>> NODELIST(REASON)
>> 469 Backgroun sleeping centos PD 0:00 1
>> (Resources)
>> 492 Priority sleepy.s superuse PD 0:00 1
>> (Resources)
>> 448 Backgroun sleepy.s groupboo PD 0:00 1
>> (Resources)
>> 478 Backgroun sleepy.s groupboo PD 0:00 1
>> (Resources)
>> 479 Backgroun sleepy.s groupboo PD 0:00 1
>> (Resources)
>> 480 Backgroun sleepy.s groupboo PD 0:00 1
>> (Resources)
>> 481 Backgroun sleepy.s groupboo PD 0:00 1
>> (Resources)
>> 482 Backgroun sleepy.s groupboo PD 0:00 1
>> (Resources)
>> 483 Backgroun sleepy.s groupboo PD 0:00 1
>> (Resources)
>> 484 Backgroun sleepy.s groupboo PD 0:00 1
>> (Resources)
>> 449 Backgroun sleepy.s superuse PD 0:00 1
>> (Resources)
>> 450 Backgroun sleepy.s superuse PD 0:00 1
>> (Resources)
>> 465 Backgroun sleeping centos PD 0:00 1
>> (Resources)
>> 466 Backgroun sleeping centos PD 0:00 1
>> (Resources)
>> 467 Backgroun sleeping centos PD 0:00 1
>> (Resources)
>>
>>
>> [root at master yp]# sprio
>> JOBID PARTITION PRIORITY AGE FAIRSHARE JOBSIZE
>> PARTITION QOS
>> 448 Backgroun 13667 58 484 3125
>> 10000 0
>> 449 Backgroun 13205 58 23 3125
>> 10000 0
>> 450 Backgroun 13205 58 23 3125
>> 10000 0
>> 465 Backgroun 13157 32 0 3125
>> 10000 0
>> 466 Backgroun 13157 32 0 3125
>> 10000 0
>> 467 Backgroun 13157 32 0 3125
>> 10000 0
>> 469 Backgroun 10013157 32 0 3125
>> 10000 10000000
>> 478 Backgroun 13640 32 484 3125
>> 10000 0
>> 479 Backgroun 13640 32 484 3125
>> 10000 0
>> 480 Backgroun 13640 32 484 3125
>> 10000 0
>> 481 Backgroun 13610 32 454 3125
>> 10000 0
>> 482 Backgroun 13610 32 454 3125
>> 10000 0
>> 483 Backgroun 13610 32 454 3125
>> 10000 0
>> 484 Backgroun 13610 32 454 3125
>> 10000 0
>> 492 Priority 1003158 11 23 3125
>> 1000000 0
>>
>>
>> I'm trying to troubleshoot why the highest priority job is not next to
>> run, jobs in the partition called "Priority" seem to run first.
>>
>> The job 469 has no qos, partition, user accounts or group limits on the
>> number of cpus,jobs,nodes etc. I've set this test cluster up from scratch
>> to be sure!
>>
>> [root at master yp]# scontrol show job 469
>> JobId=469 JobName=sleeping.sh
>> UserId=centos(1000) GroupId=centos(1000) MCS_label=N/A
>> Priority=10013161 Nice=0 Account=default QOS=boost
>> JobState=PENDING Reason=Resources Dependency=(null)
>> Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
>> RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
>> SubmitTime=2019-03-11T16:01:20 EligibleTime=2019-03-11T16:01:20
>> StartTime=2020-03-10T15:23:40 EndTime=Unknown Deadline=N/A
>> PreemptTime=None SuspendTime=None SecsPreSuspend=0
>> LastSchedEval=2019-03-11T16:54:44
>> Partition=Background AllocNode:Sid=master:1322
>> ReqNodeList=(null) ExcNodeList=(null)
>> NodeList=(null)
>> NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>> TRES=cpu=1,node=1
>> Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
>> MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>> Features=(null) DelayBoot=00:00:00
>> Gres=(null) Reservation=(null)
>> OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
>> Command=/home/centos/sleeping.sh
>> WorkDir=/home/centos
>> StdErr=/home/centos/sleeping.sh.e469
>> StdIn=/dev/null
>> StdOut=/home/centos/sleeping.sh.o469
>> Power=
>>
>> The partition called "Priority" has a priority boost assigned through qos.
>>
>> PartitionName=Priority Nodes=compute[01-02] Default=NO MaxTime=INFINITE
>> State=UP Priority=1000 QOS=Priority
>> PartitionName=Background Nodes=compute[01-02] Default=YES
>> MaxTime=INFINITE State=UP Priority=10
>>
>> Any Ideas would be much appreciated.
>>
>> Sean
>>
>>
>>
>> --
>>
>> --
>>
>> Sean Brisbane | Linux Systems Specialist
>>
>> Securelinx Ltd., Pottery Road, Dun Laoghaire, Co. Dublin.
>> Registered in Ireland No. 357396
>> www.securelinx.com <http://www.securelinx.com/> - Linux Leaders in
>> Ireland
>>
>
>
> --
>
> --
>
> Sean Brisbane | Linux Systems Specialist
> Mobile: +353(0)87 627 3024 | Office: +353 1 5065 615 (ext 610)
>
> Securelinx Ltd., Pottery Road, Dun Laoghaire, Co. Dublin.
> Registered in Ireland No. 357396
> www.securelinx.com <http://www.securelinx.com/> - Linux Leaders in Ireland
>
--
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads payerle at umd.edu
5825 University Research Park (301) 405-6135
University of Maryland
College Park, MD 20740-3831
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190312/6ba3b24a/attachment-0001.html>
More information about the slurm-users
mailing list