<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
text-align:right;
direction:rtl;
unicode-bidi:embed;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:.5in;
margin-bottom:0in;
margin-left:0in;
margin-bottom:.0001pt;
text-align:right;
direction:rtl;
unicode-bidi:embed;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.25in 1.0in 1.25in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal" style="text-align:left;direction:ltr;unicode-bidi:embed">Hello,<o:p></o:p></p>
<p class="MsoNormal" style="text-align:left;direction:ltr;unicode-bidi:embed">I have a node that from some reason change state to "Down" evert few minutes.<o:p></o:p></p>
<p class="MsoNormal" style="text-align:left;direction:ltr;unicode-bidi:embed">When I change it with scontrol to "resume" its ok until Down again.<o:p></o:p></p>
<p class="MsoNormal" style="text-align:left;direction:ltr;unicode-bidi:embed">In the slurm server log I can see error:
<o:p></o:p></p>
<p class="MsoNormal" style="text-align:left;direction:ltr;unicode-bidi:embed">"agent/is_node_resp: node:myName1 RPC:REQUEST_PING : Can't find an address, check slurm.conf"<o:p></o:p></p>
<p class="MsoNormal" style="text-align:left;direction:ltr;unicode-bidi:embed"><o:p> </o:p></p>
<p class="MsoNormal" style="text-align:left;direction:ltr;unicode-bidi:embed">Now, The error message seems kind of straight forward but I can't find the problem.<o:p></o:p></p>
<p class="MsoNormal" style="text-align:left;direction:ltr;unicode-bidi:embed">* The node is up and answer to ping from the slurm server.<o:p></o:p></p>
<p class="MsoNormal" style="text-align:left;direction:ltr;unicode-bidi:embed">* The slurm deamon on the node is up and running.<o:p></o:p></p>
<p class="MsoNormal" style="text-align:left;direction:ltr;unicode-bidi:embed">* There isn't any error on the node itself.<o:p></o:p></p>
<p class="MsoNormal" style="text-align:left;direction:ltr;unicode-bidi:embed">* There are more node, configure the same (except from the ip address) that are Ok.<o:p></o:p></p>
<p class="MsoNormal" style="text-align:left;direction:ltr;unicode-bidi:embed">* running "scontrol update state=eesume nodename"myNode" fix the problem for a short time<o:p></o:p></p>
<p class="MsoNormal" style="text-align:left;direction:ltr;unicode-bidi:embed">* restarting slurm deamon on node also fix this for a short time<o:p></o:p></p>
<p class="MsoNormal" style="text-align:left;direction:ltr;unicode-bidi:embed"><o:p> </o:p></p>
<p class="MsoNormal" style="text-align:left;direction:ltr;unicode-bidi:embed">Any idea what more I can check to resolve this?<o:p></o:p></p>
</div>
<br><pre><font color="blue">
*********************************************************************************************** Please consider the environment before printing this email ! The information contained in this communication is proprietary to Israel Aerospace Industries Ltd. and/or third parties, may contain confidential or privileged information, and is intended only for the use of the intended addressee thereof. If you are not the intended addressee, please be aware that any use, disclosure, distribution and/or copying of this communication is strictly prohibited. If you receive this communication in error, please notify the sender immediately and delete it from your computer. Thank you. Visit us at: www.iai.co.il
</font></pre><br>
</body>
</html>