[slurm-users] Nagios or Other Monitoring Plugins

Lachlan Musicman datakid at gmail.com
Thu Jan 18 14:34:38 MST 2018

On 19 January 2018 at 07:29, Ryan Novosielski <novosirj at rutgers.edu> wrote:

> Hi all,
> Looked back at the mailing list to see if there was a question about this
> already. There was some mention of /using/ Nagios, but no real mention of
> specifics. What do people monitor with Nagios? We monitor, so far,
> slurmctld, slurmdbd, and MySQL, but there are probably some others. Might
> be helpful to run “scontrol ping” for example, or similar, on our login
> nodes.
> Does anyone have any plugins they’ve written or ideas they can share?
> Nagios Exchange doesn’t have anything with SLURM anywhere in the name.
> Thanks!

Off the top of my head the only other two that I would want explicitly
would be:
 - ntp/chrony and their respective ntpd. Nodes go offline when the timing
slides too far, especially if you are using Munge.
 - authentication system - in our case ipa/sssd. Without that, even the
queued jobs will fail.

We use Zabbix in house. I was under the impression that people were moving
toward icingia2 over Nagios.


"The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics
is the insistence that we cannot ignore the truth, nor should we panic
about it. It is a shared consciousness that our institutions have failed
and our ecosystem is collapsing, yet we are still here — and we are
creative agents who can shape our destinies. Apocalyptic civics is the
conviction that the only way out is through, and the only way through is
together. "

*Greg Bloom* @greggish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180119/b8925544/attachment.html>

More information about the slurm-users mailing list