[slurm-users] What means this error ?

Valerio Bellizzomi valerio at selnet.org
Wed Jun 26 14:31:57 UTC 2019


On Wed, 2019-06-26 at 08:23 +0200, Marcus Wagner wrote:
> Have you restarted munge on all hosts?

Now it works, thanks.

> 
> On 6/25/19 4:38 PM, Valerio Bellizzomi wrote:
> > On Tue, 2019-06-25 at 16:32 +0200, Valerio Bellizzomi wrote:
> >> On Tue, 2019-06-25 at 08:48 -0400, Eli V wrote:
> >>> My first guess would be that the host is not listed as one of the two
> >>> controllers in the slurm.conf. Also, keep in mind munge, and thus
> >>> slurm is very sensitive to lack of clock synchronization between
> >>> nodes. FYI, I run a hand built slurm 18.08.07 on debian 8 & 9 without
> >>> issues. Haven't tried 10 yet.
> >> I have discovered that Slurm is also sensitive to computer names.
> >> The controller was listed but with a dot and a domain name, I have
> >> removed the dot and domain name and resolved.
> >>
> >> Now I have another problem, the slurmd on the compute node refuses to
> >> connect to the controller with this error: Protocol authentication error
> >
> > The exact error on the controller is "Invalid credentials", I have
> > copied the munge.key on both hosts but the error persists.
> >
> >
> >>>
> >>> On Tue, Jun 25, 2019 at 1:50 AM Valerio Bellizzomi <valerio at selnet.org> wrote:
> >>>> I have installed slurmctld on Debian Testing, trying to start the daemon
> >>>> by hand:
> >>>>
> >>>>
> >>>>
> >>>> # /usr/sbin/slurmctld -D -v -f /etc/slurm-llnl/slurm.conf
> >>>> slurmctld: error: High latency for 1000 calls to gettimeofday(): 2072
> >>>> microseconds
> >>>> slurmctld: pidfile not locked, assuming no running daemon
> >>>> slurmctld: slurmctld version 18.08.5-2 started on cluster selroc
> >>>> slurmctld: Munge cryptographic signature plugin loaded
> >>>> slurmctld: error: This host (master02/master02) not a valid controller
> >>>>
> >>>>
> >>>>
> >>>> Thanks
> >>>>
> >>>>
> >>>>
> >>
> >>
> >>
> >
> >
> >
> 






More information about the slurm-users mailing list