[slurm-users] Providing users with info on wait time vs. run time

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Thu Sep 15 11:53:04 UTC 2022


On 9/15/22 12:02, Loris Bennett wrote:
> Today I spotted a job which requested an entire node, then had to wait
> four around 16 hours and finally ran, apparently successfully, for less
> than 4 minutes.
> 
> As it currently seems in general fashionable for users round here to
> request the maximum number of cores available on a node without doing
> any scaling experiments or considering backfill, it seems like it would
> be a good idea to provide them with some feed back on wait/run times.
> 
> One option would be to write the information into the Slurm 'out' file
> (currently we insert the output of 'seff).  Another option would be to
> aggregate the times over, say, a month and provide a the absolute totals
> and maybe a run-to-wait ratio.
> 
> Has anyone already done anything like this?

Perhaps marginally relevant: The slurmacct script reports an "Average 
queue hours" column which is the waiting time:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/slurmacct

It would be possible to generate a job summary with waiting time divided 
by run time by changing the script.

/Ole



More information about the slurm-users mailing list