[slurm-users] Munge decode failing on new node

dean.w.schulze at gmail.com dean.w.schulze at gmail.com
Fri May 15 02:39:08 UTC 2020


This problem turned out to be that the new node was on a different subnet than the other nodes.  Once our network admin opened up ports 6817, 6818, and 6188 between the subnets the new node worked.

 

Thanks for all the responses.

 

From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Riebs, Andy
Sent: Friday, April 17, 2020 1:58 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Munge decode failing on new node

 

A couple of quick checks to see if the problem is munge:

1.	On the problem node, try
$ echo foo | munge | unmunge
2.	If (1) works, try this from the node running slurmctld to the problem node
slurm-node$ echo foo | ssh node munge | unmunge

 

From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of Dean Schulze
Sent: Friday, April 17, 2020 3:40 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com <mailto:slurm-users at lists.schedmd.com> >
Subject: Re: [slurm-users] Munge decode failing on new node

 

There is no ntp service running on any of my nodes, and all but this one is working.  I haven't heard that ntp is a requirement for slurm, just that the time be synchronized across the cluster.  And it is.

 

On Wed, Apr 15, 2020 at 12:17 PM Carlos Fenoy <minibit at gmail.com <mailto:minibit at gmail.com> > wrote:

I’d check ntp as your encoding time seems odd to me

 

On Wed, 15 Apr 2020 at 19:59, Dean Schulze <dean.w.schulze at gmail.com <mailto:dean.w.schulze at gmail.com> > wrote:

I've installed two new nodes onto my slurm cluster.  One node works, but the other one complains about an invalid credential for munge.  I've verified that the munge.key is the same as on all other nodes with


sudo cksum /etc/munge/munge.key

 

I recopied a munge.key from a node that works.  I've verified that munge uid and gid are the same on the nodes.  The time is in sync on all nodes. 

 

Here is what is in the slurmd.log:

 

 error: Unable to register: Unable to contact slurm controller (connect failure)
 error: Munge decode failed: Invalid credential
 ENCODED: Wed Dec 31 17:00:00 1969
 DECODED: Wed Dec 31 17:00:00 1969
 error: authentication: Invalid authentication credential
 error: slurm_receive_msg_and_forward: Protocol authentication error
 error: service_connection: slurm_receive_msg: Protocol authentication error
 error: Unable to register: Unable to contact slurm controller (connect failure)

 

I've checked in the munged.log and all it says is 

 

Invalid credential 

 

Thanks for your help

-- 

--
Carles Fenoy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200514/4c5167b5/attachment.htm>


More information about the slurm-users mailing list