[slurm-users] monitoring and accounting
Reed Dier
reed.dier at focusvq.com
Mon Jun 12 14:42:42 UTC 2023
Hey Andrew,
I don’t have any specific examples I can share right this second, I’ll look into making it shareable, but my solution was to throw some basic bash scripts into cron to scrap and ship into influx.
I have one script that looks at sinfo, parsing out AIOT state for nodes and CPUs, and then a very ugly, hacky sed/cut/awk to scrape GPU usage; as well as squeue to see jobs per state; both of these per partition and cluster.
I have another script that is basic sreport parsing for the tres/gres I care about, so that I can get a somewhat birdseye trend of utilization over time.
There’s likely to be something far, far better for this, but it was a quick and dirty solution to get something visible with existing tooling (Grafana/influx).
Reed
> On Jun 11, 2023, at 6:43 PM, Andrew Elwell <andrew.elwell at gmail.com> wrote:
>
> On Fri, 2 June 2023, 22:03 Jörg Striewski, <striewski at ismll.de <mailto:striewski at ismll.de>> wrote:
> Hi, we use grafana with influx, it is easy to install and works fine
>
> Hi Jörg,
>
> Are your slurm to influx scripts publicly available anywhere? I do something similar for squeue via python subprocess to call
>
> squeue -M all -a -o "%P,%a,%u,%D,%q,%T,%r"
>
> And some sinfo calls for node/cpu usage:
>
> sinfo -M {} -o "%P,%a,%F"
> sinfo -M {} -o "%%R,%a,%C,%B,%z"
>
> But I'd be interested to see what other places do. Perhaps some examples could be gathered for Ole's wiki?
>
> Andrew
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230612/bb8ec9dd/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3857 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230612/bb8ec9dd/attachment.bin>
More information about the slurm-users
mailing list