[slurm-users] Time spent in PENDING/Priority

Oren Shani oren.shani at mail.huji.ac.il
Sun Dec 10 05:31:47 UTC 2023


Hi Chip,

As others already answered, there is no full solution for this problem,
because SLURM does not record the breakdown of the wait time into the
various states and causes of waiting. As far as I know, the best thing you
can do is to consider just StartTime - EligableTime as the actual wait
time. You are correct that this still includes some expected waiting but
this expected waiting time is usually very short. So what I do, is to look
for jobs that have a relatively long period between EligableTime and
StartTime, and then I try to correlate that with other factors, such as how
much free resources were available at that time.

I hope that helps,

Oren

On Thu, Dec 7, 2023 at 10:12 PM Chip Seraphine <cseraphine at drwholdings.com>
wrote:

> Hi all,
>
> I am trying to find some good metrics for our slurm cluster, and want it
> to reflect a factor that is very important to users—how long did they have
> to wait because resources were unavailable.  This is a very key metric for
> us because it is a decent approximation of how much life could be improved
> if we had more capacity, so it’d be an important consideration when doing
> growth planning, setting user expectations, etc.  So we are specifically
> interested in how long jobs were in the PENDING state for reason Priority.
>
> Unfortunately, I’m finding that this is difficult to pull out of squeue or
> the accounting data.    My first thought was that I could simply subtract
> SubmitTime from EligibleTime (or StartTime), but that includes time spent
> in expected ways, e.g. waiting while an array chugs along.   The delta
> between StartTime and EligibleTime does not reflect the time spent PENDING
> at all, so it’s not useful either.
>
> I can grab some of my own metrics by polling squeue or the REST interface,
> I suppose, but those will be less accurate, more work, and will not allow
> me to see my past history.  I was wondering if there was something I was
> missing that someone on the list has figured out?   Perhaps some existing
> bit of accounting data that can tell me how long a job was stuck behind
> other jobs?
>
> --
>
> Chip Seraphine
> Grid Operations
> For support please use help-grid in email or slack.
> This e-mail and any attachments may contain information that is
> confidential and proprietary and otherwise protected from disclosure. If
> you are not the intended recipient of this e-mail, do not read, duplicate
> or redistribute it by any means. Please immediately delete it and any
> attachments and notify the sender that you have received it by mistake.
> Unintended recipients are prohibited from taking action on the basis of
> information in this e-mail or any attachments. The DRW Companies make no
> representations that this e-mail or any attachments are free of computer
> viruses or other defects.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231210/542b3412/attachment.htm>


More information about the slurm-users mailing list