<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body dir="auto">
<div>It very much rang a bell!</div>
<div><br>
</div>
I think there is also an scontrol command that you can use to show the actual running config (probably “show config”), which will include the defaults if you are seeing something that you don’t have specified in the config file.
<div><br>
<div dir="ltr">Sent from my iPhone</div>
<div dir="ltr"><br>
<blockquote type="cite">On Jun 15, 2022, at 21:35, Reed Dier <reed.dier@focusvq.com> wrote:<br>
<br>
</blockquote>
</div>
<blockquote type="cite">
<div dir="ltr"> Well, you nailed it.
<div class=""><br class="">
</div>
<div class="">Honestly a little surprised it was working to begin with.</div>
<div class=""><br class="">
</div>
<div class="">In the DBD conf</div>
<div class="">
<blockquote type="cite" class="">
<div class="">-#DbdPort=7031</div>
<div class="">+DbdPort=7031</div>
</blockquote>
<div class=""><br class="">
</div>
And then in the slurm.conf</div>
<div class="">
<blockquote type="cite" class="">
<div class="">-#AccountingStoragePort=3306</div>
<div class="">+AccountingStoragePort=7031</div>
</blockquote>
<div class=""><br class="">
</div>
I’m not sure how my slurm.conf showed the 3306 mysql port commented out.</div>
<div class="">I did confirm that the slurmdbd was listening on 6819 before, so I assumed that the default would be 6819 on the dbd and the “client” (ctld or otherwise) side, but somehow that wasn’t the case?</div>
<div class=""><br class="">
</div>
<div class="">Either way, I do feel like things are getting back to the right state.</div>
<div class="">So thank you so much for pointing me in the correct direction.</div>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class="">Reed <br class="">
<div><br class="">
<blockquote type="cite" class="">
<div class="">On Jun 15, 2022, at 7:50 PM, Ryan Novosielski <<a href="mailto:novosirj@rutgers.edu" class="">novosirj@rutgers.edu</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="auto" class="">Apologies for not having more concrete information available when I’m replying to you, but I figured maybe having a fast hint might be better.
<div class=""><br class="">
</div>
<div class="">Have a look at how the various daemons communicate with one another. This sounds to me like a firewall thing between maybe the SlurmCtld and where the SlurmDBD is running right now, or vice-versa or something like that. The “scontrol show cluster”
thing is a giveaway. That is populated dynamically, not pulled from a config file exactly. </div>
<div class=""><br class="">
</div>
<div class="">I ran into this exact thing years ago, but can’t remember where the firewall was the issue. <br class="">
<br class="">
<div dir="ltr" class="">Sent from my iPhone</div>
<div dir="ltr" class=""><br class="">
<blockquote type="cite" class="">On Jun 15, 2022, at 20:12, Reed Dier <<a href="mailto:reed.dier@focusvq.com" class="">reed.dier@focusvq.com</a>> wrote:<br class="">
<br class="">
</blockquote>
</div>
<blockquote type="cite" class="">
<div dir="ltr" class=""> Hoping this is an easy answer.
<div class=""><br class="">
</div>
<div class="">My mysql instance somehow corrupted itself, and I’m having to purge and start over.</div>
<div class="">This is ok, because the data in there isn’t too valuable, and we aren’t making use of associations or anything like that yet (no AccountingStorageEnforce).</div>
<div class=""><br class="">
</div>
<div class="">That said, I’ve decided to put the dbd’s mysql instance on my main database server, rather than in a small vm alongside the dbd.</div>
<div class="">Jobs are still submitting alright, and after adding the cluster back with `sacctmgr create cluster $cluster` it seems to have stopped the log firehose.</div>
<div class="">The issue I’m mainly seeing now is in the dbd logs:</div>
<div class=""><br class="">
</div>
<div class="">
<blockquote type="cite" class="">
<div class=""><font face="Menlo" class="">[2022-06-15T19:40:43.064] error: _add_registered_cluster: trying to register a cluster ($cluster) with no remote port</font></div>
<div class=""><font face="Menlo" class="">[2022-06-15T19:40:43.065] error: _add_registered_cluster: trying to register a cluster ($cluster) with no remote port</font></div>
</blockquote>
<blockquote type="cite" class="">
<div class=""><font face="Menlo" class="">[2022-06-15T19:45:39.827] error: _add_registered_cluster: trying to register a cluster ($cluster) with no remote port</font></div>
</blockquote>
<blockquote type="cite" class="">
<div class=""><font face="Menlo" class="">[2022-06-15T19:48:01.038] error: _add_registered_cluster: trying to register a cluster ($cluster) with no remote port</font></div>
</blockquote>
<blockquote type="cite" class="">
<div class=""><font face="Menlo" class="">[2022-06-15T19:48:01.039] error: _add_registered_cluster: trying to register a cluster ($cluster) with no remote port</font></div>
</blockquote>
<blockquote type="cite" class="">
<div class=""><font face="Menlo" class="">[2022-06-15T19:48:38.104] error: _add_registered_cluster: trying to register a cluster ($cluster) with no remote port</font></div>
</blockquote>
<blockquote type="cite" class="">
<div class=""><font face="Menlo" class="">[2022-06-15T19:50:39.290] error: _add_registered_cluster: trying to register a cluster ($cluster) with no remote port</font></div>
</blockquote>
<blockquote type="cite" class="">
<div class=""><font face="Menlo" class="">[2022-06-15T19:55:39.769] error: _add_registered_cluster: trying to register a cluster ($cluster) with no remote port</font></div>
</blockquote>
<div class=""><br class="">
</div>
And if I run </div>
<div class="">
<blockquote type="cite" class="">
<div class=""><font face="Menlo" class="">$ sacctmgr show cluster</font></div>
<div class=""><font face="Menlo" class=""> Cluster ControlHost ControlPort RPC Share GrpJobs GrpTRES GrpSubmit MaxJobs MaxTRES MaxSubmit MaxWall QOS Def QOS</font></div>
<div class=""><font face="Menlo" class=""> ---------- --------------- ------------ ----- --------- ------- ------------- --------- ------- ------------- --------- ----------- -------------------- ---------</font></div>
<div class=""><font face="Menlo" class=""> $cluster 0 0 1 normal</font></div>
</blockquote>
<br class="">
I can see the ControlHost, ControlPort, and RPC are all missing.</div>
<div class="">So I’m not sure what I need to do to figure out how to effectively reset my dbd.</div>
<div class="">Also, $cluster in sacctmgr matches ClusterName=$cluster in my slurm.conf.</div>
<div class=""><br class="">
</div>
<div class="">The only thing that has changed is the StorageHost in the dbd conf, and I made the database, user, and grant all on slurm_acct_db.*, on the new database server.</div>
<div class="">And I’ve verified that it has made tables, and that I can connect from the host with the correct credentials.</div>
<div class=""><br class="">
</div>
<div class="">
<blockquote type="cite" class="">
<div class=""><font face="Menlo" class="">mysql> show tables;</font></div>
<div class=""><font face="Menlo" class="">+----------------------------------+</font></div>
<div class=""><font face="Menlo" class="">| Tables_in_slurm_acct_db |</font></div>
<div class=""><font face="Menlo" class="">+----------------------------------+</font></div>
<div class=""><font face="Menlo" class="">| acct_coord_table |</font></div>
<div class=""><font face="Menlo" class="">| acct_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_assoc_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_assoc_usage_day_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_assoc_usage_hour_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_assoc_usage_month_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_event_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_job_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_last_ran_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_resv_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_step_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_suspend_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_usage_day_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_usage_hour_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_usage_month_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_wckey_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_wckey_usage_day_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_wckey_usage_hour_table |</font></div>
<div class=""><font face="Menlo" class="">| $cluster_wckey_usage_month_table |</font></div>
<div class=""><font face="Menlo" class="">| clus_res_table |</font></div>
<div class=""><font face="Menlo" class="">| cluster_table |</font></div>
<div class=""><font face="Menlo" class="">| convert_version_table |</font></div>
<div class=""><font face="Menlo" class="">| federation_table |</font></div>
<div class=""><font face="Menlo" class="">| qos_table |</font></div>
<div class=""><font face="Menlo" class="">| res_table |</font></div>
<div class=""><font face="Menlo" class="">| table_defs_table |</font></div>
<div class=""><font face="Menlo" class="">| tres_table |</font></div>
<div class=""><font face="Menlo" class="">| txn_table |</font></div>
<div class=""><font face="Menlo" class="">| user_table |</font></div>
<div class=""><font face="Menlo" class="">+----------------------------------+</font></div>
<div class=""><font face="Menlo" class="">29 rows in set (0.01 sec)</font></div>
</blockquote>
</div>
<div class="">
<div style="caret-color: rgb(0, 0, 0);" class=""><br class="">
</div>
<div style="caret-color: rgb(0, 0, 0);" class="">Any tips are appreciated.</div>
</div>
<div class=""><br class="">
</div>
<div class="">
<div style="caret-color: rgb(0, 0, 0);" class="">21.08.7 and Ubuntu 20.04.</div>
<div style="caret-color: rgb(0, 0, 0);" class="">Slurmdbd and slurmctld(1) are running on one host, and slurmctld(2) is running on another host, and is the primary.</div>
</div>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class="">Reed</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</blockquote>
</div>
</body>
</html>