<div dir="auto">Hi,<div dir="auto"><br></div><div dir="auto">You should look at that bug : <a href="https://bugs.schedmd.com/show_bug.cgi?id=4412">https://bugs.schedmd.com/show_bug.cgi?id=4412</a></div><div dir="auto"><br></div><div dir="auto">I thought it would be resolved in 17.11.0.</div><div dir="auto"><br></div><div dir="auto">Regards</div><div dir="auto">Matthieu</div></div><div class="gmail_extra"><br><div class="gmail_quote">Le 30 nov. 2017 00:56, "Andy Riebs" <<a href="mailto:andy.riebs@hpe.com">andy.riebs@hpe.com</a>> a écrit :<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<font face="Helvetica, Arial, sans-serif">We've just installed
17.11.0 on our 100+ node x86_64 cluster running CentOS 7.4 this
afternoon, and periodically see a single node (perhaps the first
node in an allocation?) get drained with the message "batch job
complete failure".</font><br>
<br>
<font face="Helvetica, Arial, sans-serif">On one node in question,
slurmd.log reports</font><br>
<font face="Helvetica, Arial, sans-serif">
</font>
<div class="m_8870140236398436823WordSection1">
<blockquote><font face="Helvetica, Arial, sans-serif"><span style="font-size:10pt;color:black">pam_unix(slurm:session):
open_session - error recovering username</span><span style="font-size:10pt;color:black"> <br>
pam_loginuid(slurm:session): unexpected response from failed
conversation
function </span></font></blockquote>
</div>
<font face="Helvetica, Arial, sans-serif">On another node drained
for the same reason,</font><br>
<blockquote><font face="Helvetica, Arial, sans-serif">error:
pam_open_session: Cannot make/remove an entry for the specified
session<br>
error: error in pam_setup<br>
error: job_manager exiting abnormally, rc = 4020<br>
sending REQUEST_COMPLETE_BATCH_SCRIPT, error:4020 status 0<br>
</font></blockquote>
<font face="Helvetica, Arial, sans-serif">slurmctld has logged</font><br>
<font face="Helvetica, Arial, sans-serif">
</font>
<div class="m_8870140236398436823WordSection1"><font face="Helvetica, Arial, sans-serif"><span style="font-size:10pt;color:black"></span></font>
<blockquote><font face="Helvetica, Arial, sans-serif"><span style="font-size:10pt;color:black">
error: slurmd error running JobId=33 on node(s)=node048:
Slurmd could not
execve job </span></font><br>
<br>
<font face="Helvetica, Arial, sans-serif"><span style="font-size:10pt;color:black">drain_nodes: node
Summer0c048 state set to DRAIN</span></font></blockquote>
<font face="Helvetica, Arial, sans-serif"><span style="font-size:10pt;color:black">It's been a long day (for other reasons),
so I'll go dig into this tomorrow. But if anyone can shine
some light on where I should start looking, I shall be most
obliged!</span></font><br>
<br>
<span><font face="Helvetica, Arial, sans-serif">Andy</font><br>
</span><span> </span><br>
</div>
<pre class="m_8870140236398436823moz-signature" cols="72">--
Andy Riebs
<a class="m_8870140236398436823moz-txt-link-abbreviated" href="mailto:andy.riebs@hpe.com" target="_blank">andy.riebs@hpe.com</a>
Hewlett-Packard Enterprise
High Performance Computing Software Engineering
<a href="tel:(404)%20648-9024" value="+14046489024" target="_blank">+1 404 648 9024</a>
My opinions are not necessarily those of HPE
May the source be with you!
</pre>
</div>
</blockquote></div></div>