<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body smarttemplateinserted="true">
    <div id="smartTemplate4-template"></div>
    Make sure that the "hostname" command returns the same name that
    Slurm expects on your compute nodes.<br>
    <br>
    <div id="smartTemplate4-quoteHeader">
      <hr> <b>From:</b> Zohar Roe Mlm <a class="moz-txt-link-rfc2396E" href="mailto:RZohar8@iai.co.il"><RZohar8@iai.co.il></a> <br>
      <b>Sent:</b> Thursday, October 25, 2018 3:02AM <br>
      <b>To:</b> 'Slurm User Community List'
      <a class="moz-txt-link-rfc2396E" href="mailto:slurm-users@lists.schedmd.com"><slurm-users@lists.schedmd.com></a><br>
      <b>Cc:</b> <br>
      <b>Subject:</b> Re: [slurm-users] Can't find an address <br>
      <p></p>
    </div>
    <span type="cite"
      cite="mid:3B20705E88247041988D4BB1DF2D44CC014BA2A9D1@EXS12.iai.co.il"
      style="display: block; word-break: break-all; margin: 9px 0 0 0;
      padding: 0; line-height:0"></span>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <meta name="Generator" content="Microsoft Word 14 (filtered medium)">
    <style><!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p
        {mso-style-priority:99;
        mso-margin-top-alt:auto;
        margin-right:0in;
        mso-margin-bottom-alt:auto;
        margin-left:0in;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:0in;
        margin-left:.5in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
span.EmailStyle18
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri","sans-serif";}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.25in 1.0in 1.25in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
    <div class="WordSection1">
      <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Hi
          Lachlan,<o:p></o:p></span></p>
      <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
      <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Thanks
          for the replay. I am trying to find more Ideas for this
          problem. May be some system or strange communication problem.<o:p></o:p></span></p>
      <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">As
          for your suggestion:<o:p></o:p></span></p>
      <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>
          Check that it's in /etc/hosts --> It is. And answer to ping
          both on ip and host name every time I check<o:p></o:p></span></p>
      <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>
          Check the slurmd logs --> In the node log there is no
          error, In the server log there is the error I wrote
          ("agent/is_node_resp: node:myName1 RPC:REQUEST_PING : Can't
          find an address, check slurm.conf ")<o:p></o:p></span></p>
      <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>
          Make sure there is enough disk space --> More than enough<o:p></o:p></span></p>
      <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>
          Make sure that it's datetime is synchronized with the others
           --> Same time and date on all nodes and Slurm server.<o:p></o:p></span></p>
      <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
      <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">The
          problem is that I don't see any other error and the node is up
          and running without any error.<o:p></o:p></span></p>
      <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">The
          communication looks good with good ping but still it looks
          like the server can't find it (And it happen every two minute,
          always).<o:p></o:p></span></p>
      <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
      <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Thanks
          for your ideas,<o:p></o:p></span></p>
      <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Roy.<o:p></o:p></span></p>
      <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
      <p class="MsoNormal"><b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">
          slurm-users [<a class="moz-txt-link-freetext" href="mailto:slurm-users-bounces@lists.schedmd.com">mailto:slurm-users-bounces@lists.schedmd.com</a>]
          <b>On Behalf Of </b>Lachlan Musicman<br>
          <b>Sent:</b> Thursday, October 25, 2018 1:59 AM<br>
          <b>To:</b> Slurm User Community List<br>
          <b>Subject:</b> Re: [slurm-users] Can't find an address<o:p></o:p></span></p>
      <p class="MsoNormal"><o:p> </o:p></p>
      <div>
        <p class="MsoNormal" style="margin-bottom:12.0pt"><o:p> </o:p></p>
        <div>
          <div>
            <p class="MsoNormal">On Wed, 24 Oct 2018 at 22:56, Zohar Roe
              MLM <<a href="mailto:RZohar8@iai.co.il"
                moz-do-not-send="true">RZohar8@iai.co.il</a>> wrote:<o:p></o:p></p>
          </div>
          <blockquote style="border:none;border-left:solid #CCCCCC
            1.0pt;padding:0in 0in 0in
            6.0pt;margin-left:4.8pt;margin-right:0in">
            <div>
              <div>
                <p class="MsoNormal"
                  style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hello,<o:p></o:p></p>
                <p class="MsoNormal"
                  style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">I
                  have a node that from some reason change state to
                  "Down" evert few minutes.<o:p></o:p></p>
                <p class="MsoNormal"
                  style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">When
                  I change it with scontrol to "resume" its ok until
                  Down again.<o:p></o:p></p>
                <p class="MsoNormal"
                  style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">In
                  the slurm server log I can see error:
                  <o:p></o:p></p>
                <p class="MsoNormal"
                  style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">"agent/is_node_resp:
                  node:myName1 RPC:REQUEST_PING : Can't find an address,
                  check slurm.conf"<o:p></o:p></p>
                <p class="MsoNormal"
                  style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
                <p class="MsoNormal"
                  style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Now,
                  The error message seems kind of straight forward but I
                  can't find the problem.<o:p></o:p></p>
                <p class="MsoNormal"
                  style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">*
                  The node is up and answer to ping from the slurm
                  server.<o:p></o:p></p>
                <p class="MsoNormal"
                  style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">*
                  The slurm deamon on the node is up and running.<o:p></o:p></p>
                <p class="MsoNormal"
                  style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">*
                  There isn't any error on the node itself.<o:p></o:p></p>
                <p class="MsoNormal"
                  style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">*
                  There are more node, configure the same (except from
                  the ip address) that are Ok.<o:p></o:p></p>
                <p class="MsoNormal"
                  style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">*
                  running "scontrol update state=eesume nodename"myNode"
                  fix the problem for a short time<o:p></o:p></p>
                <p class="MsoNormal"
                  style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">*
                  restarting slurm deamon on node also fix this for a
                  short time<o:p></o:p></p>
                <p class="MsoNormal"
                  style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
                <p class="MsoNormal"
                  style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Any
                  idea what more I can check to resolve this?<o:p></o:p></p>
              </div>
            </div>
          </blockquote>
          <div>
            <p class="MsoNormal"><o:p> </o:p></p>
          </div>
          <div>
            <p class="MsoNormal">Here's a quick top of my head
              checklist:<o:p></o:p></p>
          </div>
          <div>
            <p class="MsoNormal"><o:p> </o:p></p>
          </div>
          <div>
            <p class="MsoNormal">Check that it's in /etc/hosts<o:p></o:p></p>
          </div>
          <div>
            <p class="MsoNormal">Check the slurmd logs<o:p></o:p></p>
          </div>
          <div>
            <p class="MsoNormal">Make sure there is enough disk space<o:p></o:p></p>
          </div>
          <div>
            <p class="MsoNormal">Make sure that it's datetime is
              synchronized with the others<o:p></o:p></p>
          </div>
          <div>
            <p class="MsoNormal"><o:p> </o:p></p>
          </div>
          <div>
            <p class="MsoNormal">cheers<o:p></o:p></p>
          </div>
          <div>
            <p class="MsoNormal">L.<o:p></o:p></p>
          </div>
          <div>
            <p class="MsoNormal"><o:p> </o:p></p>
          </div>
          <p class="MsoNormal">------<o:p></o:p></p>
        </div>
        <div>
          <div>
            <div>
              <div>
                <div>
                  <div>
                    <div>
                      <div>
                        <div>
                          <div>
                            <div>
                              <div>
                                <div>
                                  <p class="MsoNormal">'...postwork
                                    futures are dismissed with the claim
                                    that "it is not in our nature to be
                                    idle", thereby demonstrating at once
                                    an essentialist view of labor and an
                                    impoverished imagination of the
                                    possibilities of nonwork.'<o:p></o:p></p>
                                </div>
                                <div>
                                  <p class="MsoNormal"><o:p> </o:p></p>
                                </div>
                                <div>
                                  <p class="MsoNormal">Kathi Weeks, <a
href="https://www.dukeupress.edu/The-Problem-with-Work/" target="_blank"
                                      moz-do-not-send="true">
                                      <i>The Problem with Work:
                                        Feminism, Marxism, Antiwork
                                        Politics and Postwork
                                        Imaginaries</i></a><o:p></o:p></p>
                                </div>
                              </div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
      <p>Default Profile <o:p></o:p></p>
    </div>
    <br>
    <pre><font color="blue">
*********************************************************************************************** Please consider the environment before printing this email ! The information contained in this communication is proprietary to Israel Aerospace Industries Ltd. and/or third parties, may contain confidential or privileged information, and is intended only for the use of the intended addressee thereof. If you are not the intended addressee, please be aware that any use, disclosure, distribution and/or copying of this communication is strictly prohibited. If you receive this communication in error, please notify the sender immediately and delete it from your computer. Thank you. Visit us at: <a class="moz-txt-link-abbreviated" href="http://www.iai.co.il">www.iai.co.il</a>

</font></pre>
    <br>
    <br>
  </body>
</html>