<div dir="ltr">also please post the output of<div>$ srun --mpi=list</div><div><br></div><div>When job crashes - is there any error messages in the relevant slurmd.log's or output on the screen?</div></div><div class="gmail_extra"><br><div class="gmail_quote">2017-12-07 9:49 GMT-08:00 Artem Polyakov <span dir="ltr"><<a href="mailto:artpol84@gmail.com" target="_blank">artpol84@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hello,<div><br></div><div>what is the value of MpiDefault option in your Slurm configuration file?</div></div><div class="gmail_extra"><div><div class="h5"><br><div class="gmail_quote">2017-12-07 9:37 GMT-08:00 Glenn (Gedaliah) Wolosh <span dir="ltr"><<a href="mailto:gwolosh@njit.edu" target="_blank">gwolosh@njit.edu</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>Hello</div><div><br></div><div>This is using Slurm version - 17.02.6 running on Scientific Linux release 7.4 (Nitrogen)</div><div><br></div><div><div>[gwolosh@p-slogin bin]$ module li</div><div><br></div><div>Currently Loaded Modules:</div><div> 1) GCCcore/.5.4.0 (H) 2) binutils/.2.26 (H) 3) GCC/5.4.0-2.26 4) numactl/2.0.11 5) hwloc/1.11.3 6) OpenMPI/1.10.3</div></div><div><br></div><div>If I run</div><div><br></div><div>srun --nodes=8 --ntasks-per-node=8 --ntasks=64 ./ep.C.64</div><div><br></div><div>It runs successfuly but I get a message —</div><div><br></div><div><div>PMI2 initialized but returned bad values for size/rank/jobid.</div><div>This is symptomatic of either a failure to use the</div><div>"--mpi=pmi2" flag in SLURM, or a borked PMI2 installation.</div><div>If running under SLURM, try adding "-mpi=pmi2" to your</div><div>srun command line. If that doesn't work, or if you are</div><div>not running under SLURM, try removing or renaming the</div><div>pmi2.h header file so PMI2 support will not automatically</div><div>be built, reconfigure and build OMPI, and then try again</div><div>with only PMI1 support enabled.</div><div><br></div><div>If I run</div><div><br></div><div>srun --nodes=8 --ntasks-per-node=8 --ntasks=64 —mpi=pmi2 ./ep.C.64</div><div><br></div><div>The job crashes</div><div><br></div><div>If I run via sbatch —</div><div><br></div><div><div>#!/bin/bash</div><div># Job name:</div><div>#SBATCH --job-name=nas_bench</div><div>#SBATCH --nodes=8</div><div>#SBATCH --ntasks=64</div><div>#SBATCH --ntasks-per-node=8</div><div>#SBATCH --time=48:00:00</div><div>#SBATCH --output=nas.out.1</div><div>#</div><div>## Command(s) to run (example):</div><div>module use $HOME/easybuild/modules/all/Co<wbr>re</div><div>module load GCC/5.4.0-2.26 OpenMPI/1.10.3</div><div>mpirun -np 64 ./ep.C.64</div></div><div><br></div><div>the job crashes</div><div><br></div><div>Using easybuild, these are my config options for ompi —</div><div><br></div><div><div>configopts = '--with-threads=posix --enable-shared --enable-mpi-thread-multiple --with-verbs '</div><div>configopts += '--enable-mpirun-prefix-by-def<wbr>ault ' # suppress failure modes in relation to mpirun path</div><div>configopts += '--with-hwloc=$EBROOTHWLOC ' # hwloc support</div><div>configopts += '--disable-dlopen ' # statically link component, don't do dynamic loading</div><div>configopts += '--with-slurm --with-pmi ‘</div></div><div><br></div><div>And finally —</div><div><br></div><div><div>$ ldd /opt/local/easybuild/software/<wbr>Compiler/GCC/5.4.0-2.26/OpenMP<wbr>I/1.10.3/bin/orterun | grep pmi</div><div> libpmi.so.0 => /usr/lib64/libpmi.so.0 (0x00007f0129d6d000)</div><div> libpmi2.so.0 => /usr/lib64/libpmi2.so.0 (0x00007f0129b51000)</div></div><div><br></div><div><div>$ ompi_info | grep pmi</div><div> MCA db: pmi (MCA v2.0.0, API v1.0.0, Component v1.10.3)</div><div> MCA ess: pmi (MCA v2.0.0, API v3.0.0, Component v1.10.3)</div><div> MCA grpcomm: pmi (MCA v2.0.0, API v2.0.0, Component v1.10.3)</div><div> MCA pubsub: pmi (MCA v2.0.0, API v2.0.0, Component v1.10.3)</div></div><div><br></div><div><br></div><div>Any suggestions?</div></div><div>
<div style="color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word"><div style="color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">_______________<br>Gedaliah Wolosh<br>IST Academic and Research Computing Systems (ARCS)<br>NJIT<br>GITC 2203<br><a href="tel:(973)%20596-5437" value="+19735965437" target="_blank">973 596 5437</a><br><a href="mailto:gwolosh@njit.edu" target="_blank">gwolosh@njit.edu</a><br></div></div>
</div>
<br></div></blockquote></div><br><br clear="all"><div><br></div></div></div><span class="HOEnZb"><font color="#888888">-- <br><div class="m_5736140807596716564gmail_signature" data-smartmail="gmail_signature">С Уважением, Поляков Артем Юрьевич<br>Best regards, Artem Y. Polyakov</div>
</font></span></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">С Уважением, Поляков Артем Юрьевич<br>Best regards, Artem Y. Polyakov</div>
</div>