[slurm-users] slurm reporting

Mark Hahn hahn at mcmaster.ca
Tue Nov 26 17:20:51 UTC 2019

> Would Grafana do similar job as XDMoD?

I was wondering whether to pipe up.  I work for ComputeCanada, which runs a
number of significant clusters.  During a major upgrade a few years ago,
we looked at XDMoD, and decided against it.  Primarily because we wanted 
greater flexibility - we have specific tracking requirements related to the
national allocation process, and also wanted better support for many sites.

What we have now is an ElasticSearch-based system, which is accessible via 
Grafana and other mechnanisms.  It integrates multiple sources of data, such
as job completion records (scraped very much like XDMoD does it), as well as 
syslog and other monitoring/collection mechanisms.  It also feeds data into 
some pre-existing database/reporting mechanisms.

It's certainly not perfect, but I mention it here because there does seem to
be a series of queries about managing cluster metadata beyond single Slurm
instances.  For instance, an exernal repository of job records means you can
more freely upgrade a cluster's Slurm, since you know all the job data is 
in an external, scalable system, and you don't have to baby slurmdbd as much.

So I think what I'm saying is that I'd encourage people to think about using 
some of the powerful, open-source infrastructure that exists for parts of
this task.  Kibana or Grafana make it incredibly easy to do basic analysis
like averages per user.  And having the data in a open infrastructure also
means that if you want, you can write a 10-line python script to generate 
a report (maybe joining data in a way Grafana doesn't let you.)  Or if you 
want to create automated actions (email notice, etc), even mods to Slurm

Mark Hahn | SHARCnet Sysadmin | hahn at sharcnet.ca | http://www.sharcnet.ca
           | McMaster RHPCS    | hahn at mcmaster.ca | 905 525 9140 x24687
           | Compute/Calcul Canada                | http://www.computecanada.ca

More information about the slurm-users mailing list