[slurm-users] How to force jobs to run next in queue
Sean Brisbane
sean.brisbane at securelinx.com
Mon Mar 11 17:00:36 UTC 2019
Hi,
I'm looking to have a way an administrator can boost any job to be next to
run when resources become available. What is the best practice way to do
this? Happy to try something new :-D
The way I thought to do this was to have a qos with a large priority and
manually assign this to the job. Job 469 is the job in this example I am
trying to elevate to be next in queue.
scontrol update jobid=469 qos=boost
sprio shows that this job is the highest priority by quite some way,
however, job nbumber 492 will be next to run
squeue (qxluding runnign jobs)
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
469 Backgroun sleeping centos PD 0:00 1
(Resources)
492 Priority sleepy.s superuse PD 0:00 1
(Resources)
448 Backgroun sleepy.s groupboo PD 0:00 1
(Resources)
478 Backgroun sleepy.s groupboo PD 0:00 1
(Resources)
479 Backgroun sleepy.s groupboo PD 0:00 1
(Resources)
480 Backgroun sleepy.s groupboo PD 0:00 1
(Resources)
481 Backgroun sleepy.s groupboo PD 0:00 1
(Resources)
482 Backgroun sleepy.s groupboo PD 0:00 1
(Resources)
483 Backgroun sleepy.s groupboo PD 0:00 1
(Resources)
484 Backgroun sleepy.s groupboo PD 0:00 1
(Resources)
449 Backgroun sleepy.s superuse PD 0:00 1
(Resources)
450 Backgroun sleepy.s superuse PD 0:00 1
(Resources)
465 Backgroun sleeping centos PD 0:00 1
(Resources)
466 Backgroun sleeping centos PD 0:00 1
(Resources)
467 Backgroun sleeping centos PD 0:00 1
(Resources)
[root at master yp]# sprio
JOBID PARTITION PRIORITY AGE FAIRSHARE JOBSIZE
PARTITION QOS
448 Backgroun 13667 58 484 3125
10000 0
449 Backgroun 13205 58 23 3125
10000 0
450 Backgroun 13205 58 23 3125
10000 0
465 Backgroun 13157 32 0 3125
10000 0
466 Backgroun 13157 32 0 3125
10000 0
467 Backgroun 13157 32 0 3125
10000 0
469 Backgroun 10013157 32 0 3125
10000 10000000
478 Backgroun 13640 32 484 3125
10000 0
479 Backgroun 13640 32 484 3125
10000 0
480 Backgroun 13640 32 484 3125
10000 0
481 Backgroun 13610 32 454 3125
10000 0
482 Backgroun 13610 32 454 3125
10000 0
483 Backgroun 13610 32 454 3125
10000 0
484 Backgroun 13610 32 454 3125
10000 0
492 Priority 1003158 11 23 3125
1000000 0
I'm trying to troubleshoot why the highest priority job is not next to run,
jobs in the partition called "Priority" seem to run first.
The job 469 has no qos, partition, user accounts or group limits on the
number of cpus,jobs,nodes etc. I've set this test cluster up from scratch
to be sure!
[root at master yp]# scontrol show job 469
JobId=469 JobName=sleeping.sh
UserId=centos(1000) GroupId=centos(1000) MCS_label=N/A
Priority=10013161 Nice=0 Account=default QOS=boost
JobState=PENDING Reason=Resources Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2019-03-11T16:01:20 EligibleTime=2019-03-11T16:01:20
StartTime=2020-03-10T15:23:40 EndTime=Unknown Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
LastSchedEval=2019-03-11T16:54:44
Partition=Background AllocNode:Sid=master:1322
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null)
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=1,node=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
Gres=(null) Reservation=(null)
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/centos/sleeping.sh
WorkDir=/home/centos
StdErr=/home/centos/sleeping.sh.e469
StdIn=/dev/null
StdOut=/home/centos/sleeping.sh.o469
Power=
The partition called "Priority" has a priority boost assigned through qos.
PartitionName=Priority Nodes=compute[01-02] Default=NO MaxTime=INFINITE
State=UP Priority=1000 QOS=Priority
PartitionName=Background Nodes=compute[01-02] Default=YES
MaxTime=INFINITE State=UP Priority=10
Any Ideas would be much appreciated.
Sean
--
--
Sean Brisbane | Linux Systems Specialist
Securelinx Ltd., Pottery Road, Dun Laoghaire, Co. Dublin.
Registered in Ireland No. 357396
www.securelinx.com <http://www.securelinx.com/> - Linux Leaders in Ireland
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190311/35a8b0a9/attachment.html>
More information about the slurm-users
mailing list