[slurm-users] [EXTERNAL] Re: Munge decode failing on new node

Sean Crosby scrosby at unimelb.edu.au
Wed Apr 15 23:00:55 UTC 2020


Who owns the munge directory and key? Is it the right uid/gid? Is the munge
daemon running?

--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia



On Thu, 16 Apr 2020 at 04:57, Dean Schulze <dean.w.schulze at gmail.com> wrote:

> *UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts.*
> ------------------------------
> /etc/munge is 700
> /etc/munge/munge.key is 400
>
>
>
> On Wed, Apr 15, 2020 at 12:11 PM Riebs, Andy <andy.riebs at hpe.com> wrote:
>
>> Two trivial things to check:
>>
>> 1.       Permissions on /etc/munge and /etc/munge.key
>>
>> 2.       Is munged running on the problem node?
>>
>>
>>
>> Andy
>>
>>
>>
>> *From:* slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] *On
>> Behalf Of *Dean Schulze
>> *Sent:* Wednesday, April 15, 2020 1:57 PM
>> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
>> *Subject:* [slurm-users] Munge decode failing on new node
>>
>>
>>
>> I've installed two new nodes onto my slurm cluster.  One node works, but
>> the other one complains about an invalid credential for munge.  I've
>> verified that the munge.key is the same as on all other nodes with
>>
>>
>> sudo cksum /etc/munge/munge.key
>>
>>
>>
>> I recopied a munge.key from a node that works.  I've verified that munge
>> uid and gid are the same on the nodes.  The time is in sync on all nodes.
>>
>>
>>
>> Here is what is in the slurmd.log:
>>
>>
>>
>>  error: Unable to register: Unable to contact slurm controller (connect
>> failure)
>>  error: Munge decode failed: Invalid credential
>>  ENCODED: Wed Dec 31 17:00:00 1969
>>  DECODED: Wed Dec 31 17:00:00 1969
>>  error: authentication: Invalid authentication credential
>>  error: slurm_receive_msg_and_forward: Protocol authentication error
>>  error: service_connection: slurm_receive_msg: Protocol authentication
>> error
>>  error: Unable to register: Unable to contact slurm controller (connect
>> failure)
>>
>>
>>
>> I've checked in the munged.log and all it says is
>>
>>
>>
>> Invalid credential
>>
>>
>>
>> Thanks for your help
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200416/f999d786/attachment-0001.htm>


More information about the slurm-users mailing list