[slurm-users] slurmd.service fails to register

William Brown william at signalbox.org.uk
Tue Dec 17 13:14:27 UTC 2019


These are the tests that we use:

The following steps can be performed to verify that the software has been
properly installed and configured.  These should be done as a
non-privileged user:

•             Generate a credential on stdout:

$ munge -n

•             Check if a credential can be locally decoded:

$ munge -n | unmunge

•             Check if a credential can be remotely decoded:

$ munge -n | ssh <somehost> unmunge

This test requires that passwordless ssh logins have been set up between
the hosts.  This is simplest where the user login area is mounted to all of
the cluster.

On Mon, 16 Dec 2019 at 20:59, Dean Schulze <dean.w.schulze at gmail.com> wrote:

> I have my controller running (slurmctld and slrumdbd) and my controller
> and node host can ping each other by name so they resolve via /etc/hosts
> settings.  When I try to start the slurmd.service it shows that it is
> active (running), but gives these errors:
>
> Unable to register: Zero Bytes were transmitted or received
>
> The controller shows this from slurmctld.service:
>
>     Munge decode failed: Invalid credential
>
> I copied the munge.key from controller to node (copying via an NFS shared
> directory required changing ownership and permissions and then changing
> them back).
>
> Apparently the node is communicating with the controller, but munge thinks
> I have a bad credential.
>
> Any idea how to troubleshoot this?
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191217/44e27dc6/attachment-0001.htm>


More information about the slurm-users mailing list