[slurm-users] Munge decode failing on new node

Dean Schulze dean.w.schulze at gmail.com
Fri Apr 17 21:38:35 UTC 2020


Just noticed this.  On the problem node the munged.log file has an entry
every 1:40:

2020-04-17 15:31:02 -0600 Info:      Invalid credential
2020-04-17 15:32:42 -0600 Info:      Invalid credential
2020-04-17 15:34:22 -0600 Info:      Invalid credential

This happens on the failed node and two other nodes that work.  Two nodes
that work (including the controller) don't have this message.



On Fri, Apr 17, 2020 at 2:00 PM Riebs, Andy <andy.riebs at hpe.com> wrote:

> A couple of quick checks to see if the problem is munge:
>
> 1.       On the problem node, try
> $ echo foo | munge | unmunge
>
> 2.       If (1) works, try this from the node running slurmctld to the
> problem node
> slurm-node$ echo foo | ssh node munge | unmunge
>
>
>
> *From:* slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] *On
> Behalf Of *Dean Schulze
> *Sent:* Friday, April 17, 2020 3:40 PM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] Munge decode failing on new node
>
>
>
> There is no ntp service running on any of my nodes, and all but this one
> is working.  I haven't heard that ntp is a requirement for slurm, just that
> the time be synchronized across the cluster.  And it is.
>
>
>
> On Wed, Apr 15, 2020 at 12:17 PM Carlos Fenoy <minibit at gmail.com> wrote:
>
> I’d check ntp as your encoding time seems odd to me
>
>
>
> On Wed, 15 Apr 2020 at 19:59, Dean Schulze <dean.w.schulze at gmail.com>
> wrote:
>
> I've installed two new nodes onto my slurm cluster.  One node works, but
> the other one complains about an invalid credential for munge.  I've
> verified that the munge.key is the same as on all other nodes with
>
>
> sudo cksum /etc/munge/munge.key
>
>
>
> I recopied a munge.key from a node that works.  I've verified that munge
> uid and gid are the same on the nodes.  The time is in sync on all nodes.
>
>
>
> Here is what is in the slurmd.log:
>
>
>
>  error: Unable to register: Unable to contact slurm controller (connect
> failure)
>  error: Munge decode failed: Invalid credential
>  ENCODED: Wed Dec 31 17:00:00 1969
>  DECODED: Wed Dec 31 17:00:00 1969
>  error: authentication: Invalid authentication credential
>  error: slurm_receive_msg_and_forward: Protocol authentication error
>  error: service_connection: slurm_receive_msg: Protocol authentication
> error
>  error: Unable to register: Unable to contact slurm controller (connect
> failure)
>
>
>
> I've checked in the munged.log and all it says is
>
>
>
> Invalid credential
>
>
>
> Thanks for your help
>
> --
>
> --
> Carles Fenoy
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200417/710035bc/attachment.htm>


More information about the slurm-users mailing list