<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Hello</div><div class=""><br class=""></div><div class="">This is using Slurm version - 17.02.6 running on Scientific Linux release 7.4 (Nitrogen)</div><div class=""><br class=""></div><div class=""><div class="">[gwolosh@p-slogin bin]$ module li</div><div class=""><br class=""></div><div class="">Currently Loaded Modules:</div><div class=""> 1) GCCcore/.5.4.0 (H) 2) binutils/.2.26 (H) 3) GCC/5.4.0-2.26 4) numactl/2.0.11 5) hwloc/1.11.3 6) OpenMPI/1.10.3</div></div><div class=""><br class=""></div><div class="">If I run</div><div class=""><br class=""></div><div class="">srun --nodes=8 --ntasks-per-node=8 --ntasks=64 ./ep.C.64</div><div class=""><br class=""></div><div class="">It runs successfuly but I get a message —</div><div class=""><br class=""></div><div class=""><div class="">PMI2 initialized but returned bad values for size/rank/jobid.</div><div class="">This is symptomatic of either a failure to use the</div><div class="">"--mpi=pmi2" flag in SLURM, or a borked PMI2 installation.</div><div class="">If running under SLURM, try adding "-mpi=pmi2" to your</div><div class="">srun command line. If that doesn't work, or if you are</div><div class="">not running under SLURM, try removing or renaming the</div><div class="">pmi2.h header file so PMI2 support will not automatically</div><div class="">be built, reconfigure and build OMPI, and then try again</div><div class="">with only PMI1 support enabled.</div><div class=""><br class=""></div><div class="">If I run</div><div class=""><br class=""></div><div class="">srun --nodes=8 --ntasks-per-node=8 --ntasks=64 —mpi=pmi2 ./ep.C.64</div><div class=""><br class=""></div><div class="">The job crashes</div><div class=""><br class=""></div><div class="">If I run via sbatch —</div><div class=""><br class=""></div><div class=""><div class="">#!/bin/bash</div><div class=""># Job name:</div><div class="">#SBATCH --job-name=nas_bench</div><div class="">#SBATCH --nodes=8</div><div class="">#SBATCH --ntasks=64</div><div class="">#SBATCH --ntasks-per-node=8</div><div class="">#SBATCH --time=48:00:00</div><div class="">#SBATCH --output=nas.out.1</div><div class="">#</div><div class="">## Command(s) to run (example):</div><div class="">module use $HOME/easybuild/modules/all/Core</div><div class="">module load GCC/5.4.0-2.26 OpenMPI/1.10.3</div><div class="">mpirun -np 64 ./ep.C.64</div></div><div class=""><br class=""></div><div class="">the job crashes</div><div class=""><br class=""></div><div class="">Using easybuild, these are my config options for ompi —</div><div class=""><br class=""></div><div class=""><div class="">configopts = '--with-threads=posix --enable-shared --enable-mpi-thread-multiple --with-verbs '</div><div class="">configopts += '--enable-mpirun-prefix-by-default ' # suppress failure modes in relation to mpirun path</div><div class="">configopts += '--with-hwloc=$EBROOTHWLOC ' # hwloc support</div><div class="">configopts += '--disable-dlopen ' # statically link component, don't do dynamic loading</div><div class="">configopts += '--with-slurm --with-pmi ‘</div></div><div class=""><br class=""></div><div class="">And finally —</div><div class=""><br class=""></div><div class=""><div class="">$ ldd /opt/local/easybuild/software/Compiler/GCC/5.4.0-2.26/OpenMPI/1.10.3/bin/orterun | grep pmi</div><div class=""> libpmi.so.0 => /usr/lib64/libpmi.so.0 (0x00007f0129d6d000)</div><div class=""> libpmi2.so.0 => /usr/lib64/libpmi2.so.0 (0x00007f0129b51000)</div></div><div class=""><br class=""></div><div class=""><div class="">$ ompi_info | grep pmi</div><div class=""> MCA db: pmi (MCA v2.0.0, API v1.0.0, Component v1.10.3)</div><div class=""> MCA ess: pmi (MCA v2.0.0, API v3.0.0, Component v1.10.3)</div><div class=""> MCA grpcomm: pmi (MCA v2.0.0, API v2.0.0, Component v1.10.3)</div><div class=""> MCA pubsub: pmi (MCA v2.0.0, API v2.0.0, Component v1.10.3)</div></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Any suggestions?</div></div><div class="">
<div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">_______________<br class="">Gedaliah Wolosh<br class="">IST Academic and Research Computing Systems (ARCS)<br class="">NJIT<br class="">GITC 2203<br class="">973 596 5437<br class=""><a href="mailto:gwolosh@njit.edu" class="">gwolosh@njit.edu</a><br class=""></div></div>
</div>
<br class=""></body></html>