[slurm-users] Monitoring with Telegraf

Pablo Llopis pablo.llopis at gmail.com
Mon Sep 30 15:37:21 UTC 2019


Hi all,

If you're using collectd to gather metrics I started writing a slurm
collectd plugin at https://github.com/pllopis/collectd/tree/slurm.
It provides per-partition info about jobs, node stats, and internal
slurm metrics such as backfill stats.

In our infra these are shipped to Influx and this is mostly what's
currently powering our slurm grafana dashboards to get an overview of
the cluster.

There's a MR open to include it upstream but I'll have to rework the
types for it to be accepted, which I'm planning to do (hopefully along
an extension to include further data, maybe from
priorities/multifactor, fairshare, and other slurm data sources).

Cheers,
Pablo

On Fri, Sep 27, 2019 at 11:34 AM Josef Dvoracek <jose at fzu.cz> wrote:
>
> some time ago I wrote this small collector,
> https://github.com/jose-d/influxdb-collectors/tree/master/slurm_metric_writer.
>
> Until you'll write/find better one, feel free to use it, send PRs with
> improvements, etc :)
>
> cheers.
>
> josef
>
> On 26. 09. 19 17:15, Marcus Boden wrote:
> > Hey everyone,
> >
> > I am using Telegraf and InfluxDB to monitor our hardware and I'd like to
> > include some slurm metrics into this. Is there already a telegraf plugin
> > for monitoring slurm I don't know about, or do I have to start from
> > scratch?
> >
> > Best,
> > Marcus
>



More information about the slurm-users mailing list