[slurm-users] Munge decode failing on new node

Dean Schulze dean.w.schulze at gmail.com
Fri Apr 17 21:22:00 UTC 2020


Both work.  The only discrepancy is that the slurm controller output had
these two lines:

UID:              ??? (1000)
GID:              ??? (1000)

Like the controller doesn't know the username for UID 1000.

But it returned success 0

On Fri, Apr 17, 2020 at 2:00 PM Riebs, Andy <andy.riebs at hpe.com> wrote:

> A couple of quick checks to see if the problem is munge:
>
> 1.       On the problem node, try
> $ echo foo | munge | unmunge
>
> 2.       If (1) works, try this from the node running slurmctld to the
> problem node
> slurm-node$ echo foo | ssh node munge | unmunge
>
>
>
> *From:* slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] *On
> Behalf Of *Dean Schulze
> *Sent:* Friday, April 17, 2020 3:40 PM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] Munge decode failing on new node
>
>
>
> There is no ntp service running on any of my nodes, and all but this one
> is working.  I haven't heard that ntp is a requirement for slurm, just that
> the time be synchronized across the cluster.  And it is.
>
>
>
> On Wed, Apr 15, 2020 at 12:17 PM Carlos Fenoy <minibit at gmail.com> wrote:
>
> I’d check ntp as your encoding time seems odd to me
>
>
>
> On Wed, 15 Apr 2020 at 19:59, Dean Schulze <dean.w.schulze at gmail.com>
> wrote:
>
> I've installed two new nodes onto my slurm cluster.  One node works, but
> the other one complains about an invalid credential for munge.  I've
> verified that the munge.key is the same as on all other nodes with
>
>
> sudo cksum /etc/munge/munge.key
>
>
>
> I recopied a munge.key from a node that works.  I've verified that munge
> uid and gid are the same on the nodes.  The time is in sync on all nodes.
>
>
>
> Here is what is in the slurmd.log:
>
>
>
>  error: Unable to register: Unable to contact slurm controller (connect
> failure)
>  error: Munge decode failed: Invalid credential
>  ENCODED: Wed Dec 31 17:00:00 1969
>  DECODED: Wed Dec 31 17:00:00 1969
>  error: authentication: Invalid authentication credential
>  error: slurm_receive_msg_and_forward: Protocol authentication error
>  error: service_connection: slurm_receive_msg: Protocol authentication
> error
>  error: Unable to register: Unable to contact slurm controller (connect
> failure)
>
>
>
> I've checked in the munged.log and all it says is
>
>
>
> Invalid credential
>
>
>
> Thanks for your help
>
> --
>
> --
> Carles Fenoy
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200417/7fc1942e/attachment.htm>


More information about the slurm-users mailing list