[slurm-users] derived counters

Juergen Salk juergen.salk at uni-ulm.de
Tue Apr 13 16:29:55 UTC 2021


* Heckes, Frank <heckes at mps.mpg.de> [210413 12:04]:

> This result from a mgmt. - question. How long jobs have to wait (in s, min, h, day) before they getting executed and 
> how many jobs are waiting (are queued) for each partition in a certain time interval. 
> The first one is easy to find with sacct and submit, start counts + difference + averaging.

Hi Frank,

depending on the definition of "waiting time", the "reserved" field 
from sacct may be more appropriate than "start" minus "submit". For
example for dependency jobs (aka chain jobs) the latter does also 
count the time a job had to wait for another job to finish 
whereas "reserved" will only start counting when a job becomes
eligible.

However, the "eligible" and "reserved" fields in sacct will be 
set or increased also if a job has hit a resource throttling limit, 
which may be something you want to factor out of the job waiting time
as well. 

Unfortunaty, I haven't found any metrics in sacct that does only
count (or allows to derive) the time a job had to wait just for 
sufficent resources to become available. Maybe someone else?

> The second is a bit cumbersome, so I wonder whether a 'solution' is
> already around. The easiest way is to monitor from the beginning and
> store the squeue ouput for later evaluation. Unfortunately I didn’t
> do that.

Not sure if this is a solution for you but I think you can at 
least resample this retrospectively from sacct by using something like 

  sacct -a -X -S 2021-04-01T00:00:00 -s PD -o JobID,User,Partition

This will return job records for all jobs that were in pending state 
at the specified time.

Best regards
Jürgen


> Cheers,
> -Frank
> 
> > The "slurmacct" command prints (possibly for a specified partition) the
> > average job waiting time while Pending in the queue, but not the queue length
> > information.
> > 
> > It may be difficult to answer your question from the Slurm database.  The sacct
> > command displays accounting data for all jobs and job steps, but not directly
> > for partitions.
> > 
> > There are other Slurm monitoring tools which perhaps can supply the data you
> > are looking for.  You could ask this list again.
> > 
> > /Ole
> 




More information about the slurm-users mailing list