[slurm-users] monitoring and accounting

Reed Dier reed.dier at focusvq.com
Mon Jun 12 14:42:42 UTC 2023

Hey Andrew,

I don’t have any specific examples I can share right this second, I’ll look into making it shareable, but my solution was to throw some basic bash scripts into cron to scrap and ship into influx.

I have one script that looks at sinfo, parsing out AIOT state for nodes and CPUs, and then a very ugly, hacky sed/cut/awk to scrape GPU usage; as well as squeue to see jobs per state; both of these per partition and cluster.
I have another script that is basic sreport parsing for the tres/gres I care about, so that I can get a somewhat birdseye trend of utilization over time.

There’s likely to be something far, far better for this, but it was a quick and dirty solution to get something visible with existing tooling (Grafana/influx).


> On Jun 11, 2023, at 6:43 PM, Andrew Elwell <andrew.elwell at gmail.com> wrote:
> On Fri, 2 June 2023, 22:03 Jörg Striewski, <striewski at ismll.de <mailto:striewski at ismll.de>> wrote:
> Hi, we use grafana with influx, it is easy to install and works fine
> Hi Jörg,
> Are your slurm to influx scripts publicly available anywhere? I do something similar for squeue via python subprocess to call
> squeue -M all -a -o "%P,%a,%u,%D,%q,%T,%r"
> And some sinfo calls for node/cpu usage:
> sinfo -M {} -o "%P,%a,%F"
> sinfo -M {} -o "%%R,%a,%C,%B,%z"
> But I'd be interested to see what other places do. Perhaps some examples could be gathered for Ole's wiki?
> Andrew

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230612/bb8ec9dd/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3857 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230612/bb8ec9dd/attachment.bin>

More information about the slurm-users mailing list