[slurm-users] How to force jobs to run next in queue
Sean Brisbane
sean.brisbane at securelinx.com
Tue Mar 12 16:39:19 UTC 2019
Hi,
Thanks for your help.
Either setting qos or setting priority doesn't work for me. However I have
found the cause if not the reason.
Using a Priority setting on the partition called "Priority" in slurm.conf
seems to force all jobs waiting on this queue to run first regardless of
any qos set on a job. Priority is not a limit, but I think this is a bit
inconsistent with the limit hierarchy we see elsewhere and possibly even a
bug.
1. Partition QOS limit*2. Job QOS limit*
3. User association
4. Account association(s), ascending the hierarchy
5. Root/Cluster association*6. Partition limit*
7. None
So for multiple partitions with differing priorities, I can get the same
effect by moving the priority into a qos, applying a qos on the partition,
and then taking care to set OverPartQOS flag on the "boost" qos.
Does anyone have a feeling for why setting a high Priority on a partition
makes jobs run in that partition first regardless that a job in a different
Partition may have a much higher overall priority?
Sean
On Mon, 11 Mar 2019 at 17:00, Sean Brisbane <sean.brisbane at securelinx.com>
wrote:
> Hi,
>
> I'm looking to have a way an administrator can boost any job to be next to
> run when resources become available. What is the best practice way to do
> this? Happy to try something new :-D
>
> The way I thought to do this was to have a qos with a large priority and
> manually assign this to the job. Job 469 is the job in this example I am
> trying to elevate to be next in queue.
>
> scontrol update jobid=469 qos=boost
>
> sprio shows that this job is the highest priority by quite some way,
> however, job nbumber 492 will be next to run
>
> squeue (qxluding runnign jobs)
> JOBID PARTITION NAME USER ST TIME NODES
> NODELIST(REASON)
> 469 Backgroun sleeping centos PD 0:00 1
> (Resources)
> 492 Priority sleepy.s superuse PD 0:00 1
> (Resources)
> 448 Backgroun sleepy.s groupboo PD 0:00 1
> (Resources)
> 478 Backgroun sleepy.s groupboo PD 0:00 1
> (Resources)
> 479 Backgroun sleepy.s groupboo PD 0:00 1
> (Resources)
> 480 Backgroun sleepy.s groupboo PD 0:00 1
> (Resources)
> 481 Backgroun sleepy.s groupboo PD 0:00 1
> (Resources)
> 482 Backgroun sleepy.s groupboo PD 0:00 1
> (Resources)
> 483 Backgroun sleepy.s groupboo PD 0:00 1
> (Resources)
> 484 Backgroun sleepy.s groupboo PD 0:00 1
> (Resources)
> 449 Backgroun sleepy.s superuse PD 0:00 1
> (Resources)
> 450 Backgroun sleepy.s superuse PD 0:00 1
> (Resources)
> 465 Backgroun sleeping centos PD 0:00 1
> (Resources)
> 466 Backgroun sleeping centos PD 0:00 1
> (Resources)
> 467 Backgroun sleeping centos PD 0:00 1
> (Resources)
>
>
> [root at master yp]# sprio
> JOBID PARTITION PRIORITY AGE FAIRSHARE JOBSIZE
> PARTITION QOS
> 448 Backgroun 13667 58 484 3125
> 10000 0
> 449 Backgroun 13205 58 23 3125
> 10000 0
> 450 Backgroun 13205 58 23 3125
> 10000 0
> 465 Backgroun 13157 32 0 3125
> 10000 0
> 466 Backgroun 13157 32 0 3125
> 10000 0
> 467 Backgroun 13157 32 0 3125
> 10000 0
> 469 Backgroun 10013157 32 0 3125
> 10000 10000000
> 478 Backgroun 13640 32 484 3125
> 10000 0
> 479 Backgroun 13640 32 484 3125
> 10000 0
> 480 Backgroun 13640 32 484 3125
> 10000 0
> 481 Backgroun 13610 32 454 3125
> 10000 0
> 482 Backgroun 13610 32 454 3125
> 10000 0
> 483 Backgroun 13610 32 454 3125
> 10000 0
> 484 Backgroun 13610 32 454 3125
> 10000 0
> 492 Priority 1003158 11 23 3125
> 1000000 0
>
>
> I'm trying to troubleshoot why the highest priority job is not next to
> run, jobs in the partition called "Priority" seem to run first.
>
> The job 469 has no qos, partition, user accounts or group limits on the
> number of cpus,jobs,nodes etc. I've set this test cluster up from scratch
> to be sure!
>
> [root at master yp]# scontrol show job 469
> JobId=469 JobName=sleeping.sh
> UserId=centos(1000) GroupId=centos(1000) MCS_label=N/A
> Priority=10013161 Nice=0 Account=default QOS=boost
> JobState=PENDING Reason=Resources Dependency=(null)
> Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
> RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
> SubmitTime=2019-03-11T16:01:20 EligibleTime=2019-03-11T16:01:20
> StartTime=2020-03-10T15:23:40 EndTime=Unknown Deadline=N/A
> PreemptTime=None SuspendTime=None SecsPreSuspend=0
> LastSchedEval=2019-03-11T16:54:44
> Partition=Background AllocNode:Sid=master:1322
> ReqNodeList=(null) ExcNodeList=(null)
> NodeList=(null)
> NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
> TRES=cpu=1,node=1
> Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
> MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
> Features=(null) DelayBoot=00:00:00
> Gres=(null) Reservation=(null)
> OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
> Command=/home/centos/sleeping.sh
> WorkDir=/home/centos
> StdErr=/home/centos/sleeping.sh.e469
> StdIn=/dev/null
> StdOut=/home/centos/sleeping.sh.o469
> Power=
>
> The partition called "Priority" has a priority boost assigned through qos.
>
> PartitionName=Priority Nodes=compute[01-02] Default=NO MaxTime=INFINITE
> State=UP Priority=1000 QOS=Priority
> PartitionName=Background Nodes=compute[01-02] Default=YES
> MaxTime=INFINITE State=UP Priority=10
>
> Any Ideas would be much appreciated.
>
> Sean
>
>
>
> --
>
> --
>
> Sean Brisbane | Linux Systems Specialist
>
> Securelinx Ltd., Pottery Road, Dun Laoghaire, Co. Dublin.
> Registered in Ireland No. 357396
> www.securelinx.com <http://www.securelinx.com/> - Linux Leaders in Ireland
>
--
--
Sean Brisbane | Linux Systems Specialist
Mobile: +353(0)87 627 3024 | Office: +353 1 5065 615 (ext 610)
Securelinx Ltd., Pottery Road, Dun Laoghaire, Co. Dublin.
Registered in Ireland No. 357396
www.securelinx.com <http://www.securelinx.com/> - Linux Leaders in Ireland
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190312/38c0d73d/attachment-0001.html>
More information about the slurm-users
mailing list