[slurm-users] slurm reporting

Henkel, Andreas henkel at uni-mainz.de
Thu Nov 28 07:20:56 UTC 2019

Hi Mark,

Thanks for your insight. We also work with elasticsearch and I appreciate the easy analysis (once one understands Kibana logic). Do you use job completion plugin as is? Or did you modify it to account for ssl or additional metrics? 


Am 26.11.2019 um 18:27 schrieb Mark Hahn <hahn at mcmaster.ca>:

>> Would Grafana do similar job as XDMoD?
> I was wondering whether to pipe up.  I work for ComputeCanada, which runs a
> number of significant clusters.  During a major upgrade a few years ago,
> we looked at XDMoD, and decided against it.  Primarily because we wanted greater flexibility - we have specific tracking requirements related to the
> national allocation process, and also wanted better support for many sites.
> What we have now is an ElasticSearch-based system, which is accessible via Grafana and other mechnanisms.  It integrates multiple sources of data, such
> as job completion records (scraped very much like XDMoD does it), as well as syslog and other monitoring/collection mechanisms.  It also feeds data into some pre-existing database/reporting mechanisms.
> It's certainly not perfect, but I mention it here because there does seem to
> be a series of queries about managing cluster metadata beyond single Slurm
> instances.  For instance, an exernal repository of job records means you can
> more freely upgrade a cluster's Slurm, since you know all the job data is in an external, scalable system, and you don't have to baby slurmdbd as much.
> So I think what I'm saying is that I'd encourage people to think about using some of the powerful, open-source infrastructure that exists for parts of
> this task.  Kibana or Grafana make it incredibly easy to do basic analysis
> like averages per user.  And having the data in a open infrastructure also
> means that if you want, you can write a 10-line python script to generate a report (maybe joining data in a way Grafana doesn't let you.)  Or if you want to create automated actions (email notice, etc), even mods to Slurm
> controls.
> regards,
> --
> Mark Hahn | SHARCnet Sysadmin | hahn at sharcnet.ca | http://www.sharcnet.ca
>          | McMaster RHPCS    | hahn at mcmaster.ca | 905 525 9140 x24687
>          | Compute/Calcul Canada                | http://www.computecanada.ca

More information about the slurm-users mailing list