[slurm-users] Efficiency of the profile influxdb plugin for graphing live job stats

Andrew Elwell andrew.elwell at gmail.com
Sat Feb 22 10:15:53 UTC 2020


On Sat, 14 Dec 2019 at 04:25, Lech Nieroda <lech.nieroda at uni-koeln.de> wrote:

[OK, so I'm a bit lagged finding this]

> I’ve been tinkering with the acct_gather_profile/influxdb plugin a bit  in order to visualize the cpu and memory usage of live jobs.
> Both the influxdb backend and Grafana dashboards seem like a perfect fit for our needs.

Ditto - I've been working on dashboards for jobcomp/elasticsearch too,
(I'll push it to grafana.com once it looks "shiny" and useful) as we
use collectd/influxdb/grafana for most of our node monitoring.

[snip]
"value = NNN" is a pain when you're trying to plot these.

> So a single „series" would be:
> Measurement: acct_gather_profile_task   Tags: job, step, task, host   Fields: CPUTime, CPUUtilization, CPUFrequency, RSS, VMSize, Pages, ReadMB, WriteMB    Timestamp

YES! Make it so! much more efficient and you can add any qualifiers
(floats/ints) as needed as per
https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_reference/

sounds like a good plan -

I've been testing this just now and agree about crap schema design

[root at alfred ~]# influx
Connected to http://localhost:8086 version 1.7.8
InfluxDB shell version: 1.7.8
> use slurm
Using database slurm
> show measurements
name: measurements
name
----
CPUFrequency
CPUTime
CPUUtilization
Pages
RSS
ReadMB
VMSize
WriteMB
> select * from CPUUtilization
name: CPUUtilization
time                host    job step task value
----                ----    --- ---- ---- -----
1582364991000000000 client1 662 -2   0    0
1582365021000000000 client1 662 -2   0    0
1582365051000000000 client1 662 -2   0    0
1582365081000000000 client1 662 -2   0    0
1582365352000000000 client1 663 0    0    0
1582365382000000000 client1 663 0    0    99.8
1582365412000000000 client1 663 0    0    99.87
1582365442000000000 client1 663 0    0    99.83
1582365472000000000 client1 663 0    0    98.73


Out of interest, what retention policy are you using for profile data?

Andrew



More information about the slurm-users mailing list