[slurm-users] Munge decode failing on new node

dean.w.schulze at gmail.com dean.w.schulze at gmail.com
Wed Apr 22 19:56:49 UTC 2020


There is a third user account on all machines in the cluster that is the
user account for using the cluster.  That account has uid 1000 on all four
worker nodes, but on the controller it is 1001.  So that is probably why the
question marks.

I doubt this is the issue when 3 of the 4 nodes that work have the same uid
mismatch for that user (nor the slurm or munge user).


-----Original Message-----
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Chris
Samuel
Sent: Monday, April 20, 2020 12:03 AM
To: slurm-users at lists.schedmd.com
Subject: Re: [slurm-users] Munge decode failing on new node

On Friday, 17 April 2020 2:22:00 PM PDT Dean Schulze wrote:

> Both work.  The only discrepancy is that the slurm controller output 
> had these two lines:
> 
> UID:              ??? (1000)
> GID:              ??? (1000)
> 
> Like the controller doesn't know the username for UID 1000.

What does this say on the controller and the compute node?

getent passwd 1000

Are you using LDAP or the like to ensure that all nodes have the same user
database?

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA








More information about the slurm-users mailing list