[slurm-users] NAS benchmarks - problem with openmpi, slurm and pmi
Glenn (Gedaliah) Wolosh
gwolosh at njit.edu
Thu Dec 7 10:37:01 MST 2017
Hello
This is using Slurm version - 17.02.6 running on Scientific Linux release 7.4 (Nitrogen)
[gwolosh at p-slogin bin]$ module li
Currently Loaded Modules:
1) GCCcore/.5.4.0 (H) 2) binutils/.2.26 (H) 3) GCC/5.4.0-2.26 4) numactl/2.0.11 5) hwloc/1.11.3 6) OpenMPI/1.10.3
If I run
srun --nodes=8 --ntasks-per-node=8 --ntasks=64 ./ep.C.64
It runs successfuly but I get a message —
PMI2 initialized but returned bad values for size/rank/jobid.
This is symptomatic of either a failure to use the
"--mpi=pmi2" flag in SLURM, or a borked PMI2 installation.
If running under SLURM, try adding "-mpi=pmi2" to your
srun command line. If that doesn't work, or if you are
not running under SLURM, try removing or renaming the
pmi2.h header file so PMI2 support will not automatically
be built, reconfigure and build OMPI, and then try again
with only PMI1 support enabled.
If I run
srun --nodes=8 --ntasks-per-node=8 --ntasks=64 —mpi=pmi2 ./ep.C.64
The job crashes
If I run via sbatch —
#!/bin/bash
# Job name:
#SBATCH --job-name=nas_bench
#SBATCH --nodes=8
#SBATCH --ntasks=64
#SBATCH --ntasks-per-node=8
#SBATCH --time=48:00:00
#SBATCH --output=nas.out.1
#
## Command(s) to run (example):
module use $HOME/easybuild/modules/all/Core
module load GCC/5.4.0-2.26 OpenMPI/1.10.3
mpirun -np 64 ./ep.C.64
the job crashes
Using easybuild, these are my config options for ompi —
configopts = '--with-threads=posix --enable-shared --enable-mpi-thread-multiple --with-verbs '
configopts += '--enable-mpirun-prefix-by-default ' # suppress failure modes in relation to mpirun path
configopts += '--with-hwloc=$EBROOTHWLOC ' # hwloc support
configopts += '--disable-dlopen ' # statically link component, don't do dynamic loading
configopts += '--with-slurm --with-pmi ‘
And finally —
$ ldd /opt/local/easybuild/software/Compiler/GCC/5.4.0-2.26/OpenMPI/1.10.3/bin/orterun | grep pmi
libpmi.so.0 => /usr/lib64/libpmi.so.0 (0x00007f0129d6d000)
libpmi2.so.0 => /usr/lib64/libpmi2.so.0 (0x00007f0129b51000)
$ ompi_info | grep pmi
MCA db: pmi (MCA v2.0.0, API v1.0.0, Component v1.10.3)
MCA ess: pmi (MCA v2.0.0, API v3.0.0, Component v1.10.3)
MCA grpcomm: pmi (MCA v2.0.0, API v2.0.0, Component v1.10.3)
MCA pubsub: pmi (MCA v2.0.0, API v2.0.0, Component v1.10.3)
Any suggestions?
_______________
Gedaliah Wolosh
IST Academic and Research Computing Systems (ARCS)
NJIT
GITC 2203
973 596 5437
gwolosh at njit.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171207/e2da8917/attachment-0001.html>
More information about the slurm-users
mailing list