<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Did you upgrade SLURM or is it a fresh install?<div class=""><br class=""><div class=""><div class="">Are there any associations set? For instance, did you create the cluster with sacctmgr?</div><div class=""><font face="Menlo" class="">sacctmgr add cluster <name></font></div><div class=""><br class=""></div><div class="">Is mariadb/mysql server running, is slurmdbd running? Is it working? Try a simple test, such as:</div><div class=""><pre class="">sacctmgr show user -s</pre><div class="">If it was an upgrade, did you try to run the slurmdbd and slurmctld manuallly first:</div></div><div class=""><br class=""></div><div class=""><font face="Menlo" class="">slurmdbd -Dvvvvv</font></div><div class=""><br class=""></div><div class="">Then controller:</div><div class=""><br class=""></div><div class=""><font face="Menlo" class="">slurmctld -Dvvvvv</font></div><div class=""><br class=""></div><div class="">Which OS is that?</div><div class="">Is there a firewall/selinux/ACLs?</div><div class=""><br class=""></div><div class="">Cheers,</div><div class="">Barbara</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><div><blockquote type="cite" class=""><div class="">On 29 Nov 2017, at 15:19, Bruno Santos <<a href="mailto:bacmsantos@gmail.com" class="">bacmsantos@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Thank you Barbara, <div class=""><br class=""></div><div class="">Unfortunately, it does not seem to be a munge problem. Munge can successfully authenticate with the nodes. </div><div class=""><br class=""></div><div class="">I have increased the verbosity level and restarted the slurmctld and now I am getting more information about this:</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Nov 29 14:08:16 plantae slurmctld[30340]: Registering slurmctld at port 6817 with slurmdbd.</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Nov 29 14:08:16 plantae slurmctld[30340]: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to localhost:6819: Initial RPC not DBD_INIT</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Nov 29 14:08:16 plantae slurmctld[30340]: error: slurmdbd: Sending PersistInit msg: No error</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Nov 29 14:08:16 plantae slurmctld[30340]: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to localhost:6819: Initial RPC not DBD_INIT</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Nov 29 14:08:16 plantae slurmctld[30340]: error: slurmdbd: Sending PersistInit msg: No error</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Nov 29 14:08:16 plantae slurmctld[30340]: fatal: It appears you don't have any association data from your database.  The priority/multifactor plugin requires this information to run correctly.  Please check your database connection and try again.</blockquote></blockquote><div class=""><br class=""></div><div class="">The problem seems to somehow be related to slurmdbd?  </div><div class="">I am a bit lost at this point, to be honest. </div><div class=""><br class=""></div><div class="">Best,</div><div class="">Bruno</div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On 29 November 2017 at 14:06, Barbara KraĊĦovec <span dir="ltr" class=""><<a href="mailto:barbara.krasovec@ijs.si" target="_blank" class="">barbara.krasovec@ijs.si</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class="">Hello,<div class=""><br class=""></div><div class="">does munge work?</div><div class="">Try if decode works locally:</div><div class=""><span style="font-family:monospace" class="">munge -n | unmunge</span></div><div class="">Try if decode works remotely:</div><div class=""><code class="">munge -n | ssh <somehost_in_cluster> unmunge</code></div><div class=""><code class=""><br class=""></code></div><div class=""><code class=""><font face="Helvetica" class="">It seems as munge keys do not match...</font></code></div><div class=""><br class=""></div><div class="">See comments inline..<br class=""><div class=""><br class=""><div class=""><span class=""><blockquote type="cite" class=""><div class="">On 29 Nov 2017, at 14:40, Bruno Santos <<a href="mailto:bacmsantos@gmail.com" target="_blank" class="">bacmsantos@gmail.com</a>> wrote:</div><br class="m_-4368466949655319384Apple-interchange-newline"><div class=""><div dir="ltr" class="">I actually just managed to figure that one out. <div class=""><br class=""></div><div class="">The problem was that I had setup AccountingStoragePass=magic in the slurm.conf file while after re-reading the documentation it seems this is only needed if I have a different munge instance controlling the logins to the database, which I don't. </div><div class="">So commenting that line out seems to have worked however I am now getting a different error: </div><div class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Nov 29 13:19:20 plantae slurmctld[29984]: Registering slurmctld at port 6817 with slurmdbd.<br class="">Nov 29 13:19:20 plantae slurmctld[29984]: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to localhost:6819: Initial RPC not DBD_INIT<br class="">Nov 29 13:19:20 plantae systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE<br class="">Nov 29 13:19:20 plantae systemd[1]: slurmctld.service: Unit entered failed state.<br class="">Nov 29 13:19:20 plantae systemd[1]: slurmctld.service: Failed with result 'exit-code'.</blockquote><div class=""><br class=""></div><div class="">My slurm.conf looks like this</div><div class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"># LOGGING AND ACCOUNTING<br class="">AccountingStorageHost=<wbr class="">localhost<br class="">AccountingStorageLoc=slurm_db<br class="">#AccountingStoragePass=magic<br class="">#AccountingStoragePort=<br class="">AccountingStorageType=<wbr class="">accounting_storage/slurmdbd<br class="">AccountingStorageUser=slurm<br class="">AccountingStoreJobComment=YES<br class="">ClusterName=research<br class="">JobCompType=jobcomp/none<br class="">JobAcctGatherFrequency=30<br class="">JobAcctGatherType=jobacct_<wbr class="">gather/none<br class="">SlurmctldDebug=3<br class="">SlurmdDebug=3</blockquote></div></div></div></div></blockquote><div class=""><br class=""></div></span>You only need:</div><div class=""><div class="">AccountingStorageEnforce=<wbr class="">associations,limits,qos</div><div class="">AccountingStorageHost=<<wbr class="">hostname></div><div class="">AccountingStorageType=<wbr class="">accounting_storage/slurmdbd</div><div class=""><br class=""></div><div class="">You can remove AccountingStorageLoc and AccountingStorageUser.</div><span class=""><div class=""><br class=""></div><div class=""><br class=""></div><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class=""><br class=""></div><div class="">And the slurdbd.conf like this:</div><div class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">ArchiveEvents=yes<br class="">ArchiveJobs=yes<br class="">ArchiveResvs=yes<br class="">ArchiveSteps=no<br class="">#ArchiveTXN=no<br class="">#ArchiveUsage=no<br class=""># Authentication info<br class="">AuthType=auth/munge<br class="">AuthInfo=/var/run/munge/munge.<wbr class="">socket.2</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">#Database info<br class=""># slurmDBD info<br class="">DbdAddr=plantae<br class="">DbdHost=plantae<br class=""># Database info<br class="">StorageType=accounting_<wbr class="">storage/mysql<br class="">StorageHost=localhost<br class="">SlurmUser=slurm<br class="">StoragePass=magic<br class="">StorageUser=slurm<br class="">StorageLoc=slurm_db</blockquote></div><div class=""><br class=""></div><div class=""><br class=""></div></div><div class="">Thank you very much in advance. </div><div class=""><br class=""></div><div class="">Best,</div><div class="">Bruno </div></div></div></div></blockquote><div class=""><br class=""></div></span>Cheers,</div><div class="">Barbara</div><div class=""><div class="h5"><div class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class=""><br class=""></div></div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On 29 November 2017 at 13:28, Andy Riebs <span dir="ltr" class=""><<a href="mailto:andy.riebs@hpe.com" target="_blank" class="">andy.riebs@hpe.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000" class="">
    It looks like you don't have the munged daemon running.<div class=""><div class="m_-4368466949655319384h5"><br class="">
    <br class="">
    <div class="m_-4368466949655319384m_4940209100258878838moz-cite-prefix">On 11/29/2017 08:01 AM, Bruno Santos
      wrote:<br class="">
    </div>
    <blockquote type="cite" class="">
      
      <div dir="ltr" class="">Hi everyone,
        <div class=""><br class="">
        </div>
        <div class="">I have set-up slurm to use slurm_db and all was working
          fine. However I had to change the slurm.conf to play with user
          priority and upon restarting the slurmctl is fails with the
          following messages below. It seems that somehow is trying to
          use the mysql password as a munge socket? </div>
        <div class="">Any idea how to solve it? </div>
        <div class=""> </div>
        <div class="">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Nov 29 12:56:30 plantae
            slurmctld[29613]: Registering slurmctld at port 6817 with
            slurmdbd.<br class="">
            Nov 29 12:56:32 plantae slurmctld[29613]: error: If munged
            is up, restart with --num-threads=10<br class="">
            Nov 29 12:56:32 plantae slurmctld[29613]: error: Munge
            encode failed: Failed to access "magic": No such file or
            directory<br class="">
            Nov 29 12:56:32 plantae slurmctld[29613]: error:
            authentication: Socket communication error<br class="">
            Nov 29 12:56:32 plantae slurmctld[29613]: error:
            slurm_persist_conn_open: failed to send persistent
            connection init message to localhost:6819<br class="">
            Nov 29 12:56:32 plantae slurmctld[29613]: error: slurmdbd:
            Sending PersistInit msg: Protocol authentication error<br class="">
            Nov 29 12:56:34 plantae slurmctld[29613]: error: If munged
            is up, restart with --num-threads=10<br class="">
            Nov 29 12:56:34 plantae slurmctld[29613]: error: Munge
            encode failed: Failed to access "magic": No such file or
            directory<br class="">
            Nov 29 12:56:34 plantae slurmctld[29613]: error:
            authentication: Socket communication error<br class="">
            Nov 29 12:56:34 plantae slurmctld[29613]: error:
            slurm_persist_conn_open: failed to send persistent
            connection init message to localhost:6819<br class="">
            Nov 29 12:56:34 plantae slurmctld[29613]: error: slurmdbd:
            Sending PersistInit msg: Protocol authentication error<br class="">
            Nov 29 12:56:36 plantae slurmctld[29613]: error: If munged
            is up, restart with --num-threads=10<br class="">
            Nov 29 12:56:36 plantae slurmctld[29613]: error: Munge
            encode failed: Failed to access "magic": No such file or
            directory<br class="">
            Nov 29 12:56:36 plantae slurmctld[29613]: error:
            authentication: Socket communication error<br class="">
            Nov 29 12:56:36 plantae slurmctld[29613]: error:
            slurm_persist_conn_open: failed to send persistent
            connection init message to localhost:6819<br class="">
            Nov 29 12:56:36 plantae slurmctld[29613]: error: slurmdbd:
            Sending PersistInit msg: Protocol authentication error<br class="">
            Nov 29 12:56:36 plantae slurmctld[29613]: fatal: It appears
            you don't have any association data from your database.  The
            priority/multifactor plugin requires this information to run
            correctly.  Please check your database connection and try
            again.<br class="">
            Nov 29 12:56:36 plantae systemd[1]: slurmctld.service: Main
            process exited, code=exited, status=1/FAILURE<br class="">
            Nov 29 12:56:36 plantae systemd[1]: slurmctld.service: Unit
            entered failed state.<br class="">
            Nov 29 12:56:36 plantae systemd[1]: slurmctld.service:
            Failed with result 'exit-code'.</blockquote>
          <div class=""><br class="">
          </div>
          <div class=""> </div>
        </div>
      </div>
    </blockquote>
    <br class="">
  </div></div></div>

</blockquote></div><br class=""></div>
</div></blockquote></div><br class=""></div></div></div></div></div></blockquote></div><br class=""></div>
</div></blockquote></div><br class=""></div></div></div></body></html>