<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hello,<div class=""><br class=""></div><div class="">does munge work?</div><div class="">Try if decode works locally:</div><div class=""><span style="font-family: monospace;" class="">munge -n | unmunge</span></div><div class="">Try if decode works remotely:</div><div class=""><code class="">munge -n | ssh <somehost_in_cluster> unmunge</code></div><div class=""><code class=""><br class=""></code></div><div class=""><code class=""><font face="Helvetica" class="">It seems as munge keys do not match...</font></code></div><div class=""><br class=""></div><div class="">See comments inline..<br class=""><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 29 Nov 2017, at 14:40, Bruno Santos <<a href="mailto:bacmsantos@gmail.com" class="">bacmsantos@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">I actually just managed to figure that one out. <div class=""><br class=""></div><div class="">The problem was that I had setup AccountingStoragePass=magic in the slurm.conf file while after re-reading the documentation it seems this is only needed if I have a different munge instance controlling the logins to the database, which I don't. </div><div class="">So commenting that line out seems to have worked however I am now getting a different error: </div><div class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Nov 29 13:19:20 plantae slurmctld[29984]: Registering slurmctld at port 6817 with slurmdbd.<br class="">Nov 29 13:19:20 plantae slurmctld[29984]: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to localhost:6819: Initial RPC not DBD_INIT<br class="">Nov 29 13:19:20 plantae systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE<br class="">Nov 29 13:19:20 plantae systemd[1]: slurmctld.service: Unit entered failed state.<br class="">Nov 29 13:19:20 plantae systemd[1]: slurmctld.service: Failed with result 'exit-code'.</blockquote><div class=""><br class=""></div><div class="">My slurm.conf looks like this</div><div class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"># LOGGING AND ACCOUNTING<br class="">AccountingStorageHost=localhost<br class="">AccountingStorageLoc=slurm_db<br class="">#AccountingStoragePass=magic<br class="">#AccountingStoragePort=<br class="">AccountingStorageType=accounting_storage/slurmdbd<br class="">AccountingStorageUser=slurm<br class="">AccountingStoreJobComment=YES<br class="">ClusterName=research<br class="">JobCompType=jobcomp/none<br class="">JobAcctGatherFrequency=30<br class="">JobAcctGatherType=jobacct_gather/none<br class="">SlurmctldDebug=3<br class="">SlurmdDebug=3</blockquote></div></div></div></div></blockquote><div><br class=""></div>You only need:</div><div><div>AccountingStorageEnforce=associations,limits,qos</div><div>AccountingStorageHost=<hostname></div><div>AccountingStorageType=accounting_storage/slurmdbd</div><div><br class=""></div><div>You can remove AccountingStorageLoc and AccountingStorageUser.</div><div><br class=""></div><div><br class=""></div><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class=""><br class=""></div><div class="">And the slurdbd.conf like this:</div><div class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">ArchiveEvents=yes<br class="">ArchiveJobs=yes<br class="">ArchiveResvs=yes<br class="">ArchiveSteps=no<br class="">#ArchiveTXN=no<br class="">#ArchiveUsage=no<br class=""># Authentication info<br class="">AuthType=auth/munge<br class="">AuthInfo=/var/run/munge/munge.socket.2</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">#Database info<br class=""># slurmDBD info<br class="">DbdAddr=plantae<br class="">DbdHost=plantae<br class=""># Database info<br class="">StorageType=accounting_storage/mysql<br class="">StorageHost=localhost<br class="">SlurmUser=slurm<br class="">StoragePass=magic<br class="">StorageUser=slurm<br class="">StorageLoc=slurm_db</blockquote></div><div class=""><br class=""></div><div class=""><br class=""></div></div><div class="">Thank you very much in advance. </div><div class=""><br class=""></div><div class="">Best,</div><div class="">Bruno </div></div></div></div></blockquote><div><br class=""></div>Cheers,</div><div>Barbara</div><div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class=""><br class=""></div></div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On 29 November 2017 at 13:28, Andy Riebs <span dir="ltr" class=""><<a href="mailto:andy.riebs@hpe.com" target="_blank" class="">andy.riebs@hpe.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000" class="">
It looks like you don't have the munged daemon running.<div class=""><div class="h5"><br class="">
<br class="">
<div class="m_4940209100258878838moz-cite-prefix">On 11/29/2017 08:01 AM, Bruno Santos
wrote:<br class="">
</div>
<blockquote type="cite" class="">
<div dir="ltr" class="">Hi everyone,
<div class=""><br class="">
</div>
<div class="">I have set-up slurm to use slurm_db and all was working
fine. However I had to change the slurm.conf to play with user
priority and upon restarting the slurmctl is fails with the
following messages below. It seems that somehow is trying to
use the mysql password as a munge socket? </div>
<div class="">Any idea how to solve it? </div>
<div class=""> </div>
<div class="">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Nov 29 12:56:30 plantae
slurmctld[29613]: Registering slurmctld at port 6817 with
slurmdbd.<br class="">
Nov 29 12:56:32 plantae slurmctld[29613]: error: If munged
is up, restart with --num-threads=10<br class="">
Nov 29 12:56:32 plantae slurmctld[29613]: error: Munge
encode failed: Failed to access "magic": No such file or
directory<br class="">
Nov 29 12:56:32 plantae slurmctld[29613]: error:
authentication: Socket communication error<br class="">
Nov 29 12:56:32 plantae slurmctld[29613]: error:
slurm_persist_conn_open: failed to send persistent
connection init message to localhost:6819<br class="">
Nov 29 12:56:32 plantae slurmctld[29613]: error: slurmdbd:
Sending PersistInit msg: Protocol authentication error<br class="">
Nov 29 12:56:34 plantae slurmctld[29613]: error: If munged
is up, restart with --num-threads=10<br class="">
Nov 29 12:56:34 plantae slurmctld[29613]: error: Munge
encode failed: Failed to access "magic": No such file or
directory<br class="">
Nov 29 12:56:34 plantae slurmctld[29613]: error:
authentication: Socket communication error<br class="">
Nov 29 12:56:34 plantae slurmctld[29613]: error:
slurm_persist_conn_open: failed to send persistent
connection init message to localhost:6819<br class="">
Nov 29 12:56:34 plantae slurmctld[29613]: error: slurmdbd:
Sending PersistInit msg: Protocol authentication error<br class="">
Nov 29 12:56:36 plantae slurmctld[29613]: error: If munged
is up, restart with --num-threads=10<br class="">
Nov 29 12:56:36 plantae slurmctld[29613]: error: Munge
encode failed: Failed to access "magic": No such file or
directory<br class="">
Nov 29 12:56:36 plantae slurmctld[29613]: error:
authentication: Socket communication error<br class="">
Nov 29 12:56:36 plantae slurmctld[29613]: error:
slurm_persist_conn_open: failed to send persistent
connection init message to localhost:6819<br class="">
Nov 29 12:56:36 plantae slurmctld[29613]: error: slurmdbd:
Sending PersistInit msg: Protocol authentication error<br class="">
Nov 29 12:56:36 plantae slurmctld[29613]: fatal: It appears
you don't have any association data from your database. The
priority/multifactor plugin requires this information to run
correctly. Please check your database connection and try
again.<br class="">
Nov 29 12:56:36 plantae systemd[1]: slurmctld.service: Main
process exited, code=exited, status=1/FAILURE<br class="">
Nov 29 12:56:36 plantae systemd[1]: slurmctld.service: Unit
entered failed state.<br class="">
Nov 29 12:56:36 plantae systemd[1]: slurmctld.service:
Failed with result 'exit-code'.</blockquote>
<div class=""><br class="">
</div>
<div class=""> </div>
</div>
</div>
</blockquote>
<br class="">
</div></div></div>
</blockquote></div><br class=""></div>
</div></blockquote></div><br class=""></div></div></body></html>