<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>You don't specify what OS or version you're using. If you're
using RHEL 7 or a derivative, chrony is used by default over ntpd,
so there could be some confusion between chronyd and ntpd. If you
haven't done so already, I'd check to see which daemon is actually
running on your system. </p>
<p>Can you share the complete output of ntpq -p with us, and let us
know what nodes the output is from? You might want to run
'ntpdate' before starting ntpd. If the clocks are too far off,
either ntpd won't correct the time, or it will take a long time.
ntpdate immediately syncs up the time between servers. <br>
</p>
<p>I would make sure ntpdate is installed and enabled, then reboot
both compute nodes. This will make sure that ntpdate is called at
startup before ntpd, and will then make sure all start using the
correct time. <br>
</p>
<p>--<br>
Prentice<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 10/27/20 2:08 PM, Gard Nelson wrote:<br>
</div>
<blockquote type="cite"
cite="mid:2E3A44BA-784B-4D03-814A-01E64D84542C@contoso.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Hi everyone,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I’m adding a
new node to an existing cluster. After installing slurm and
the prereqs, I synced the clocks with ntpd. When I run ‘ntpq
-p’, I get 0.0 for delay, offset and jitter. (the slurm head
node is also the ntp server) ‘date’ also gives me identical
times for the head and compute nodes. However, when I start
slurmd, I get a munge error about the clocks being out of
sync. From the slurmctld log:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[2020-10-27T11:02:06.511]
node NEW_NODE returned to service<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[2020-10-27T11:02:07.265]
error: Munge decode failed: Rewound credential<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[2020-10-27T11:02:07.265]
ENCODED: Tue Oct 27 11:09:45 2020<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[2020-10-27T11:02:07.265]
DECODED: Tue Oct 27 11:02:07 2020<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[2020-10-27T11:02:07.265]
error: Check for out of sync clocks<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[2020-10-27T11:02:07.265]
error: slurm_unpack_received_msg:
MESSAGE_NODE_REGISTRATION_STATUS has authentication error:
Rewound credential<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[2020-10-27T11:02:07.265]
error: slurm_unpack_received_msg: Protocol authentication
error<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[2020-10-27T11:02:07.275]
error: slurm_receive_msg [HEAD_NODE_IP:PORT]: Unspecified
error<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I restarted
ntp, munge and the slurm daemons on both nodes before this
last error was generated. Any idea what’s going on here?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Thanks,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Gard<o:p></o:p></span></p>
</div>
<h5><font color="gray">CONFIDENTIALITY NOTICE<br>
This e-mail message and any attachments are only for the use
of the intended recipient and may contain information that is
privileged, confidential or exempt from disclosure under
applicable law. If you are not the intended recipient, any
disclosure, distribution or other use of this e-mail message
or attachments is prohibited. If you have received this e-mail
message in error, please delete and notify the sender
immediately. Thank you.</font></h5>
<font color="gray">
</font>
</blockquote>
<pre class="moz-signature" cols="72">--
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
<a class="moz-txt-link-freetext" href="http://www.pppl.gov">http://www.pppl.gov</a></pre>
</body>
</html>