<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Occasionally during program exit with Open MPI SHMEM jobs, we are
seeing the following message:<br>
<br>
srun: fatal: ../../../src/api/step_launch.c:1037
step_launch_state_destroy: pthread_mutex_destroy(): Device or
resource busy<br>
<br>
Our environment:<br>
<ul>
<li>100+ node KNL cluster</li>
<li>CentOS 7.4</li>
<li>Open MPI 3.x (an interim kit between 3.0 and 3.1)</li>
<li>Slurm 17.11.0<br>
</li>
</ul>
This was reported at
<a class="moz-txt-link-rfc2396E" href="https://bugs.schedmd.com/show_bug.cgi?id=4333"><https://bugs.schedmd.com/show_bug.cgi?id=4333></a> against a
17.11.0 RC kit, but we are seeing it now in the 17.11.0 released kit
(I confirmed that Moe's fix appears in our sources). Has anyone else
seen this? Or better yet, has anyone found a way to fix it?<br>
<br>
Andy<br>
<pre class="moz-signature" cols="72">--
Andy Riebs
<a class="moz-txt-link-abbreviated" href="mailto:andy.riebs@hpe.com">andy.riebs@hpe.com</a>
Hewlett-Packard Enterprise
High Performance Computing Software Engineering
+1 404 648 9024
My opinions are not necessarily those of HPE
May the source be with you!
</pre>
</body>
</html>