[slurm-users] estimate queue time using 'sbatch --test-only'
Renfro, Michael
Renfro at tntech.edu
Wed Sep 15 20:24:25 UTC 2021
I can imagine at least the following causing differences in the estimated time and the actual start time:
* If running users have overestimated their job times, and their jobs finish earlier than expected, the original estimate will be high.
* If another user's job submission gets higher priority than yours while your job is still pending (because of scheduler policy including fairshare), your job can get pushed back, and the original estimate will be low.
* If the test-only scheduling code doesn't account for backfill, the original estimate could be high.
Haven't looked at the code to see if the test-only parameter goes through a complete scheduling cycle before returning the estimate, but I can guarantee that the first two items above happen all the time on my much simpler cluster here.
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Feng Li <li2251 at purdue.edu>
Date: Wednesday, September 15, 2021 at 3:14 PM
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: [slurm-users] estimate queue time using 'sbatch --test-only'
Hi and thanks for reading this!
I am trying to estimate the queue time of a job of a certain size and walltime limit. I am doing this because our project considers multiple HPC resources and needs estimated queue time information to decide where to actually submit the job.
>From the man page of ‘sbatch’, I found that the “test-only” option can be used to “validate the batch script and return an estimate of when a job would be scheduled to run given the current job queue and all the other arguments specifying the job requirements”. This looks very promising to us.
I tried several launches in IU BigRed3 and TACC Stampede2 HPCs, the recorded results are shown below. (the last two columns are the estimated queue time and actual queue time). From the results, it looks like the estimated time is quite inaccurate (can be either over-estimated or under-estimated):
-----start of output
site
slurm version
partition
JobID
node
np
walltime_mins
timestamp_estimate
estimated_start
submit_time
actual_start
estimated_wait
actual_wait
stampede2
18.08.5-2
skx-normal
8436162
1
48
10
9/9/2021 16:05
9/11/2021 23:29
9/9/2021 16:08
9/9/2021 16:11
55:23:56
0:02:49
Stampede2
18.08.5-2
skx-normal
8436369
1
48
10
9/9/2021 16:51
9/12/2021 0:04
9/9/2021 16:51
9/9/2021 16:52
55:13:00
0:00:58
Stampede2
18.08.5-2
normal
8436193
1
48
10
9/9/2021 16:17
9/9/2021 18:02
9/9/2021 16:19
9/9/2021 16:19
1:45:26
0:00:02
Stampede2
18.08.5-2
normal
8436308
2
48
10
9/9/2021 16:40
9/9/2021 18:25
9/9/2021 16:41
9/9/2021 16:41
1:45:00
0:00:04
Bigred3
20.11.7
general
1727144
1
24
10
9/9/2021 17:57
9/10/2021 12:39
9/9/2021 17:59
9/9/2021 17:59
18:42:00
0:00:00
Bigred3
20.11.7
general
1734075
1
24
60
9/15/2021 14:54
9/15/2021 14:54
9/15/2021 14:54
9/15/2021 15:01
0:00:00
0:07:11
Bigred3
20.11.7
general
1734079
1
24
20
9/15/2021 15:09
9/15/2021 15:09
9/15/2021 15:09
9/15/2021 15:09
0:00:00
0:00:01
Bigred3
20.11.7
general
1734081
4
24
60
9/15/2021 15:11
9/15/2021 15:11
9/15/2021 15:11
9/15/2021 15:34
0:00:00
0:22:15
-----end of output
Could you suggest better ways to estimating the queue time? Or are there any specific configurations/situations on those systems on those systems that might affect the qeueue time estimation? (e.g. fair sharing and site-specific QoS settings?)
Below is an example of my measurement for your information:
-----begin of example
lifen at elogin1(:):~$date && sbatch --test-only -n 24 -N 4 -p general -t 00:60:00 --wrap "hostname"
Wed Sep 15 15:11:49 EDT 2021
sbatch: Job 1734080 to start at 2021-09-15T15:11:49 using 24 processors on nodes nid00[935-938] in partition general
lifen at elogin1(:):~$date && sbatch -n 24 -N 4 -p general -t 00:60:00 --wrap "hostname"
Wed Sep 15 15:11:58 EDT 2021
Submitted batch job 1734081
lifen at elogin1(:):~$sacct --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist -j 1734081
User JobID JobName Partition State Timelimit Start End Elapsed MaxRSS MaxVMSize NNodes NCPUS NodeList
--------- ------------ ---------- ---------- ---------- ---------- ------------------- ------------------- ---------- ---------- ---------- -------- ---------- ---------------
lifen 1734081 wrap general COMPLETED 01:00:00 2021-09-15T15:34:13 2021-09-15T15:34:13 00:00:00 4 24 nid00[169,883,+
1734081.bat+ batch COMPLETED 2021-09-15T15:34:13 2021-09-15T15:34:13 00:00:00 2136K 226420K 1 18 nid00169
1734081.ext+ extern COMPLETED 2021-09-15T15:34:13 2021-09-15T15:34:13 00:00:00 4K 4K 4 24 nid00[169,883,+
-----end of example
Thanks,
Feng Li
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210915/99ef7dfc/attachment-0001.htm>
More information about the slurm-users
mailing list