[slurm-users] Efficiency of the profile influxdb plugin for graphing live job stats

Lech Nieroda lech.nieroda at uni-koeln.de
Fri Dec 13 17:22:08 UTC 2019


I’ve been tinkering with the acct_gather_profile/influxdb plugin a bit  in order to visualize the cpu and memory usage of live jobs.
Both the influxdb backend and Grafana dashboards seem like a perfect fit for our needs.

I’ve run into an issue though and made a crude workaround for it, maybe someone knows a better way?

A few words about influxdb and the influxdb plugin:
InfluxDB is a NoSQL database that organizes its data in „series“, which are unique sets of „measurements“ and „tags“, which correspond roughly to tables and their indexed fields (if you prefer relational DB).
A single „series“ can reference a multitude of timestamped records described further by non indexed „fields".
The acct_gather_profile/influxdb plugin defines its data points for each job task/step as follows:

Measurement: CPUTime   Tags: job, host, step, task   Fields: value   Timestamp
Measurement: CPUUtilization   Tags: job, host, step, task   Fields: value   Timestamp

e.g. a single record would look like 
CPUTime,job=12465711,step=0,task=3,host=node20307 value=20.80 1576054517

The default „Task“ Profile contains 8 such characteristics:  CPUTime, CPUUtilization, CPUFrequency, RSS, VMSize, Pages, ReadMB, WriteMB

This data structure means that for each job step, task or host, 8 unique „series“ are created, e.g. „CPUTime, job, step, task, host“, „CPUUtilization, job, step, task, host“, ...
Those „series“ then reference the timestamped values of the respective measurements. The „tags“ can be used to „group by“ in queries, e.g. the performance of a single job on a specified host.

OK, so what’s the problem?
There are two: the number of created „series“ and data redundancy.
InfluxDB limits the number of „series“ per default to 1 million, and for good reason: each „series“ increases RAM usage since it’s used as an index. 
The number of „series" or "series cardinality" is one of the most important factors determining memory usage;  the influxdb manual considers a cardinality above 10 million as „probably infeasible“.
When you consider that each combination of a new job/host/step/task creates 8 „series“, the default limit can be reached relatively quickly. Performance problems follow.
As to data redundancy: for each timestamp a large part of the same data is stored multiple times under different „measurements“.

The current workaround: store the 8 characteristics as „fields“ rather than „measurements“, thus creating 1 series per job/step/task/host rather than 8. It also reduces data redundancy, saving roughly 70%.

So a single „series" would be:
Measurement: acct_gather_profile_task   Tags: job, step, task, host   Fields: CPUTime, CPUUtilization, CPUFrequency, RSS, VMSize, Pages, ReadMB, WriteMB    Timestamp

Another benefit is that identical „measurement“ names like e.g. „WriteMB" which are used both by the task and the lustre/fs profile plugins can be differentiated.

Further Ideas?

Kind regards,

More information about the slurm-users mailing list