[slurm-users] derived counters

Matthew BETTINGER matthew.bettinger at external.total.com
Wed Apr 14 14:07:09 UTC 2021


Before you get all excited about it,  we have had a terrible time trying to get gppu metrics.  Finally abandoned and switch to  Grafana, Prometheus influx.  Good luck to you though.

From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of "Heckes, Frank" <heckes at mps.mpg.de>
Reply-To: Slurm User Community List <slurm-users at lists.schedmd.com>
Date: Wednesday, April 14, 2021 at 1:56 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] derived counters

Hi all, many thanks for all hints. The link in the latest pointing points to an impressive switch-board.
Cheers,
-Frank

From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Renfro, Michael
Sent: Tuesday, 13 April 2021 19:25
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] derived counters

I'll never miss an opportunity to plug XDMoD for anyone who doesn't want to write custom analytics for every metric. I've managed to get a little bit into its API to extract current values for number of jobs completed and the number of CPU-hours provided, and insert those into a single slide presentation for introductory meetings.

You can see a working version of it for the NSF XSEDE facilities at https://xdmod.ccr.buffalo.edu

From: slurm-users <slurm-users-bounces at lists.schedmd.com<mailto:slurm-users-bounces at lists.schedmd.com>> on behalf of Hadrian Djohari <hxd58 at case.edu<mailto:hxd58 at case.edu>>
Date: Tuesday, April 13, 2021 at 8:11 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com<mailto:slurm-users at lists.schedmd.com>>
Subject: Re: [slurm-users] derived counters

External Email Warning

This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.

________________________________
Hi Frank,

A way to get "how long jobs wait in the queue" is to import the data to XDMOD (https://open.xdmod.org/9.0/index.html<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopen.xdmod.org%2F9.0%2Findex.html&data=04%7C01%7Crenfro%40tntech.edu%7C38d51462bef94bee8a9708d8fe7db3d9%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637539163146606550%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=5d82B%2BR1JhiuuUn0is%2FWojmMlt87YpzLnBI%2FOtpokTY%3D&reserved=0>).
The nifty reporting tool has many features to make it easier for us to report out the cluster usage.

Hadrian

On Tue, Apr 13, 2021 at 8:08 AM Heckes, Frank <heckes at mps.mpg.de<mailto:heckes at mps.mpg.de>> wrote:
Hello Ole,

> >> -----Original Message-----
> >>>    * (average) queue length for a certain partition
>
> I wonder what exactly does your question mean?  Maybe the number of jobs or
> CPUs in the Pending state?  Maybe relative to the number of CPUs in the
> partition?
>
This result from a mgmt. - question. How long jobs have to wait (in s, min, h, day) before they getting executed and
how many jobs are waiting (are queued) for each partition in a certain time interval.
The first one is easy to find with sacct and submit, start counts + difference + averaging.
The second is a bit cumbersome, so I wonder whether a 'solution' is already around. The easiest way is to monitor from the beginning and store the squeue ouput for later evaluation. Unfortunately I didn’t do that.

Cheers,
-Frank

> The "slurmacct" command prints (possibly for a specified partition) the
> average job waiting time while Pending in the queue, but not the queue length
> information.
>
> It may be difficult to answer your question from the Slurm database.  The sacct
> command displays accounting data for all jobs and job steps, but not directly
> for partitions.
>
> There are other Slurm monitoring tools which perhaps can supply the data you
> are looking for.  You could ask this list again.
>
> /Ole


--
Hadrian Djohari
Manager of Research Computing Services, [U]Tech
Case Western Reserve University
(W): 216-368-0395
(M): 216-798-7490
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210414/50fa7df6/attachment.htm>


More information about the slurm-users mailing list