[slurm-users] Problem with slurmctl communication with clurmdbd

Barbara Krašovec barbara.krasovec at ijs.si
Wed Nov 29 06:46:25 MST 2017


I was struggling like crazy with this one a while ago.
Then I saw this in the slurm.conf man page:

AccountingStoragePass
The  password  used  to gain access to the database to store the accounting data.  Only used for database type storage plugins, ignored otherwise.  In the case of
              Slurm DBD (Database Daemon) with MUNGE authentication this can be configured to use a MUNGE daemon specifically configured to provide authentication between clus‐
              ters  while  the  default MUNGE daemon provides authentication within a cluster.  In that case, AccountingStoragePass should specify the named port to be used for
              communications with the alternate MUNGE daemon (e.g.  "/var/run/munge/global.socket.2"). The default value is NULL.  Also see DefaultStoragePass.

So in case you are using MUNGE, you leave this out in slurm.conf, because the path to the socket is used as default. You specify the database password only in slurmdbd.conf.

Cheers,
Barbara

> On 29 Nov 2017, at 14:28, Andy Riebs <andy.riebs at hpe.com> wrote:
> 
> It looks like you don't have the munged daemon running.
> 
> On 11/29/2017 08:01 AM, Bruno Santos wrote:
>> Hi everyone,
>> 
>> I have set-up slurm to use slurm_db and all was working fine. However I had to change the slurm.conf to play with user priority and upon restarting the slurmctl is fails with the following messages below. It seems that somehow is trying to use the mysql password as a munge socket?
>> Any idea how to solve it?
>> 
>> Nov 29 12:56:30 plantae slurmctld[29613]: Registering slurmctld at port 6817 with slurmdbd.
>> Nov 29 12:56:32 plantae slurmctld[29613]: error: If munged is up, restart with --num-threads=10
>> Nov 29 12:56:32 plantae slurmctld[29613]: error: Munge encode failed: Failed to access "magic": No such file or directory
>> Nov 29 12:56:32 plantae slurmctld[29613]: error: authentication: Socket communication error
>> Nov 29 12:56:32 plantae slurmctld[29613]: error: slurm_persist_conn_open: failed to send persistent connection init message to localhost:6819
>> Nov 29 12:56:32 plantae slurmctld[29613]: error: slurmdbd: Sending PersistInit msg: Protocol authentication error
>> Nov 29 12:56:34 plantae slurmctld[29613]: error: If munged is up, restart with --num-threads=10
>> Nov 29 12:56:34 plantae slurmctld[29613]: error: Munge encode failed: Failed to access "magic": No such file or directory
>> Nov 29 12:56:34 plantae slurmctld[29613]: error: authentication: Socket communication error
>> Nov 29 12:56:34 plantae slurmctld[29613]: error: slurm_persist_conn_open: failed to send persistent connection init message to localhost:6819
>> Nov 29 12:56:34 plantae slurmctld[29613]: error: slurmdbd: Sending PersistInit msg: Protocol authentication error
>> Nov 29 12:56:36 plantae slurmctld[29613]: error: If munged is up, restart with --num-threads=10
>> Nov 29 12:56:36 plantae slurmctld[29613]: error: Munge encode failed: Failed to access "magic": No such file or directory
>> Nov 29 12:56:36 plantae slurmctld[29613]: error: authentication: Socket communication error
>> Nov 29 12:56:36 plantae slurmctld[29613]: error: slurm_persist_conn_open: failed to send persistent connection init message to localhost:6819
>> Nov 29 12:56:36 plantae slurmctld[29613]: error: slurmdbd: Sending PersistInit msg: Protocol authentication error
>> Nov 29 12:56:36 plantae slurmctld[29613]: fatal: It appears you don't have any association data from your database.  The priority/multifactor plugin requires this information to run correctly.  Please check your database connection and try again.
>> Nov 29 12:56:36 plantae systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE
>> Nov 29 12:56:36 plantae systemd[1]: slurmctld.service: Unit entered failed state.
>> Nov 29 12:56:36 plantae systemd[1]: slurmctld.service: Failed with result 'exit-code'.
>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171129/0107c3d8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171129/0107c3d8/attachment.sig>


More information about the slurm-users mailing list