[slurm-users] NHC and slurm

Heitor heitorpbittencourt at gmail.com
Thu Apr 15 13:58:31 UTC 2021


Hello,

I'm trying to setup NHC[0] for our Slurm cluster, but I'm not getting
it to work properly.

I'm using the dev branch from [0] and compiled it this way:

$ ./autogen.sh --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib
$ make test
$ sudo make install

When I run nhc, I get an error that sshd is not running:

$ sudo nhc
ERROR:  nhc:  Health check failed:  check_ps_service:  Service sshd (process sshd) owned by root not running

I know sshd is running because I logged in this machine with ssh. And
`systemctl status sshd` shows it is active.

Here's a sample of my nhc.conf:

   * || check_ps_service munged
   * || check_ps_service -u root sshd
   * || check_ps_service -u root ssh
   * || check_ps_service ssh
   * || check_ps_service sshd

If I run `sudo nhc -a` to run all the tests, it gives 4 errors about
ssh.

NHC can find munge running, so what's the problem with ssh? What am I
missing?

I'm using Ubuntu 20.04.

Cheers,
Heitor


[0] https://github.com/mej/nhc/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210415/dd94d118/attachment.sig>


More information about the slurm-users mailing list