<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">I think this message can also happen if the slurm.conf on your login node is missing the entry for the slurmd node. 2020 versions have a way to automate sync
of the configuration.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> slurm-users <slurm-users-bounces@lists.schedmd.com>
<b>On Behalf Of </b>Patrick Bégou<br>
<b>Sent:</b> Thursday, November 12, 2020 7:38 AM<br>
<b>To:</b> slurm-users@lists.schedmd.com<br>
<b>Subject:</b> Re: [slurm-users] failed to send msg type 6002: No route to host<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p><span style="font-size:9.0pt;font-family:"Verdana",sans-serif;color:#CC0000">This message was sent by an external party.<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">Hi slurm admins and developpers,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">no one has an idea about this problem ?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Still investigating this morning I discover that it works from the management node (a small VM running slurmctld) even if I have no home directory on it (I use a su command from root to gain unprivileged user setup). It still doesn't run
from the login node even with all firewall disabled :-( <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Patrick<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Le 10/11/2020 à 11:54, Patrick Bégou a écrit :<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p>Hi,<o:p></o:p></p>
<p>I'm new to slurm (as admin) and I need some help. Testing my initial setup with:<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">[begou@tenibre ~]$ <b>salloc -n 1 sh</b><br>
salloc: Granted job allocation 11<br>
sh-4.4$ <b>squeue</b><br>
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br>
<b>11 </b> all sh begou R 0:16 1 tenibre-0-0<br>
sh-4.4$<b> srun /usr/bin/hostname</b><br>
srun: error: timeout waiting for task launch, started 0 of 1 tasks<br>
srun: Job step 11.0 aborted before step completely launched.<br>
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.<br>
srun: error: Timed out waiting for job step to complete<o:p></o:p></p>
</blockquote>
<p>I check the connections:<o:p></o:p></p>
<p><b>tenibre is the login node</b> (no daemon running)<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">nc -v tenibre-0-0 6818 <br>
nc -v management1 6817<o:p></o:p></p>
</blockquote>
<p class="MsoNormal"><b>management1 is the management node</b> (slurmctld running)<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">nc -v tenibre-0-0 6818<o:p></o:p></p>
</blockquote>
<p class="MsoNormal"><b>tenibre-0-0 is the first compute node</b> (slurmd running)
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p>nc -v management1 6817<o:p></o:p></p>
</blockquote>
<p>All tests return "<i>Ncat: Connected...</i>"<o:p></o:p></p>
<p>The command "id begou" works on all nodes and I can reach my home directory on the login node and on the compute node.<o:p></o:p></p>
<p>On the compute node slurmd.log shows:<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">[2020-11-10T11:21:38.050]<b> launch task</b> <b>11.0 </b>request from UID:23455 GID:1036 HOST:172.30.1.254 PORT:42220<br>
[2020-11-10T11:21:38.050] debug: Checking credential with 508 bytes of sig data<br>
[2020-11-10T11:21:38.050] _run_prolog: run job script took usec=12<br>
[2020-11-10T11:21:38.050] _run_prolog: prolog with lock for job 11 ran for 0 seconds<br>
[2020-11-10T11:21:38.053] debug: AcctGatherEnergy NONE plugin loaded<br>
[2020-11-10T11:21:38.053] debug: AcctGatherProfile NONE plugin loaded<br>
[2020-11-10T11:21:38.053] debug: AcctGatherInterconnect NONE plugin loaded<br>
[2020-11-10T11:21:38.053] debug: AcctGatherFilesystem NONE plugin loaded<br>
[2020-11-10T11:21:38.053] debug: switch NONE plugin loaded<br>
[2020-11-10T11:21:38.054] [11.0] debug: Job accounting gather NOT_INVOKED plugin loaded<br>
[2020-11-10T11:21:38.054] [11.0] debug: Message thread started pid = 12099<br>
[2020-11-10T11:21:38.054] debug: task_p_slurmd_reserve_resources: 11 0<br>
[2020-11-10T11:21:38.068] [11.0] debug: task NONE plugin loaded<br>
[2020-11-10T11:21:38.068] [11.0] debug: Checkpoint plugin loaded: checkpoint/none<br>
[2020-11-10T11:21:38.068] [11.0] Munge credential signature plugin loaded<br>
[2020-11-10T11:21:38.068] [11.0] debug: job_container none plugin loaded<br>
[2020-11-10T11:21:38.068] [11.0] debug: mpi type = pmi2<br>
[2020-11-10T11:21:38.068] [11.0] debug: xcgroup_instantiate: cgroup '/sys/fs/cgroup/freezer/slurm' already exists<br>
[2020-11-10T11:21:38.068] [11.0] debug: spank: opening plugin stack /etc/slurm/plugstack.conf<br>
[2020-11-10T11:21:38.068] [11.0] debug: mpi type = (null)<br>
[2020-11-10T11:21:38.068] [11.0] debug: using mpi/pmi2<br>
[2020-11-10T11:21:38.068] [11.0] debug: _setup_stepd_job_info: SLURM_STEP_RESV_PORTS not found in env<br>
[2020-11-10T11:21:38.068] [11.0] debug: mpi/pmi2: setup sockets<br>
[2020-11-10T11:21:38.069] [11.0] debug: mpi/pmi2: started agent thread<br>
[2020-11-10T11:21:38.069] [11.0]<b> error: connect io: No route to host</b><br>
[2020-11-10T11:21:38.069] [11.0] error: IO setup failed: No route to host<br>
[2020-11-10T11:21:38.069] [11.0] debug: step_terminate_monitor_stop signaling condition<br>
[2020-11-10T11:21:38.069] [11.0] error: job_manager exiting abnormally, rc = 4021<br>
[2020-11-10T11:21:38.069] [11.0] debug: Sending launch resp rc=4021<br>
[2020-11-10T11:21:38.069] [11.0] debug: _send_srun_resp_msg: 0/5 <b>failed to send msg type 6002: No route to host</b><br>
[2020-11-10T11:21:38.169] [11.0] debug: _send_srun_resp_msg: 1/5 failed to send msg type 6002: No route to host<br>
[2020-11-10T11:21:38.370] [11.0] debug: _send_srun_resp_msg: 2/5 failed to send msg type 6002: No route to host<br>
[2020-11-10T11:21:38.770] [11.0] debug: _send_srun_resp_msg: 3/5 failed to send msg type 6002: No route to host<br>
[2020-11-10T11:21:39.570] [11.0] debug: _send_srun_resp_msg: 4/5 failed to send msg type 6002: No route to host<br>
[2020-11-10T11:21:40.370] [11.0] debug: _send_srun_resp_msg: 5/5 failed to send msg type 6002: No route to host<br>
[2020-11-10T11:21:40.372] [11.0] debug: Message thread exited<br>
[2020-11-10T11:21:40.372] [11.0] debug: mpi/pmi2: agent thread exit<br>
[2020-11-10T11:21:40.372] [11.0] <b>done with job</b><o:p></o:p></p>
</blockquote>
<p><o:p> </o:p></p>
<p>But I do not understand what this "No route to host" means.<o:p></o:p></p>
<p><o:p> </o:p></p>
<p>Thanks for your help.<o:p></o:p></p>
<p>Patrick<o:p></o:p></p>
<p><o:p> </o:p></p>
</blockquote>
<p><o:p> </o:p></p>
</div>
</div>
</body>
</html>