[slurm-users] NHC and slurm
Heitor
heitorpbittencourt at gmail.com
Thu Apr 15 13:58:31 UTC 2021
Hello,
I'm trying to setup NHC[0] for our Slurm cluster, but I'm not getting
it to work properly.
I'm using the dev branch from [0] and compiled it this way:
$ ./autogen.sh --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib
$ make test
$ sudo make install
When I run nhc, I get an error that sshd is not running:
$ sudo nhc
ERROR: nhc: Health check failed: check_ps_service: Service sshd (process sshd) owned by root not running
I know sshd is running because I logged in this machine with ssh. And
`systemctl status sshd` shows it is active.
Here's a sample of my nhc.conf:
* || check_ps_service munged
* || check_ps_service -u root sshd
* || check_ps_service -u root ssh
* || check_ps_service ssh
* || check_ps_service sshd
If I run `sudo nhc -a` to run all the tests, it gives 4 errors about
ssh.
NHC can find munge running, so what's the problem with ssh? What am I
missing?
I'm using Ubuntu 20.04.
Cheers,
Heitor
[0] https://github.com/mej/nhc/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210415/dd94d118/attachment.sig>
More information about the slurm-users
mailing list