[slurm-users] NAS benchmarks - problem with openmpi, slurm and pmi
Artem Polyakov
artpol84 at gmail.com
Thu Dec 7 10:49:23 MST 2017
Hello,
what is the value of MpiDefault option in your Slurm configuration file?
2017-12-07 9:37 GMT-08:00 Glenn (Gedaliah) Wolosh <gwolosh at njit.edu>:
> Hello
>
> This is using Slurm version - 17.02.6 running on Scientific Linux release
> 7.4 (Nitrogen)
>
> [gwolosh at p-slogin bin]$ module li
>
> Currently Loaded Modules:
> 1) GCCcore/.5.4.0 (H) 2) binutils/.2.26 (H) 3) GCC/5.4.0-2.26 4)
> numactl/2.0.11 5) hwloc/1.11.3 6) OpenMPI/1.10.3
>
> If I run
>
> srun --nodes=8 --ntasks-per-node=8 --ntasks=64 ./ep.C.64
>
> It runs successfuly but I get a message —
>
> PMI2 initialized but returned bad values for size/rank/jobid.
> This is symptomatic of either a failure to use the
> "--mpi=pmi2" flag in SLURM, or a borked PMI2 installation.
> If running under SLURM, try adding "-mpi=pmi2" to your
> srun command line. If that doesn't work, or if you are
> not running under SLURM, try removing or renaming the
> pmi2.h header file so PMI2 support will not automatically
> be built, reconfigure and build OMPI, and then try again
> with only PMI1 support enabled.
>
> If I run
>
> srun --nodes=8 --ntasks-per-node=8 --ntasks=64 —mpi=pmi2 ./ep.C.64
>
> The job crashes
>
> If I run via sbatch —
>
> #!/bin/bash
> # Job name:
> #SBATCH --job-name=nas_bench
> #SBATCH --nodes=8
> #SBATCH --ntasks=64
> #SBATCH --ntasks-per-node=8
> #SBATCH --time=48:00:00
> #SBATCH --output=nas.out.1
> #
> ## Command(s) to run (example):
> module use $HOME/easybuild/modules/all/Core
> module load GCC/5.4.0-2.26 OpenMPI/1.10.3
> mpirun -np 64 ./ep.C.64
>
> the job crashes
>
> Using easybuild, these are my config options for ompi —
>
> configopts = '--with-threads=posix --enable-shared
> --enable-mpi-thread-multiple --with-verbs '
> configopts += '--enable-mpirun-prefix-by-default ' # suppress failure
> modes in relation to mpirun path
> configopts += '--with-hwloc=$EBROOTHWLOC ' # hwloc support
> configopts += '--disable-dlopen ' # statically link component, don't do
> dynamic loading
> configopts += '--with-slurm --with-pmi ‘
>
> And finally —
>
> $ ldd /opt/local/easybuild/software/Compiler/GCC/5.4.0-2.26/OpenMPI/1.10.3/bin/orterun
> | grep pmi
> libpmi.so.0 => /usr/lib64/libpmi.so.0 (0x00007f0129d6d000)
> libpmi2.so.0 => /usr/lib64/libpmi2.so.0 (0x00007f0129b51000)
>
> $ ompi_info | grep pmi
> MCA db: pmi (MCA v2.0.0, API v1.0.0, Component v1.10.3)
> MCA ess: pmi (MCA v2.0.0, API v3.0.0, Component v1.10.3)
> MCA grpcomm: pmi (MCA v2.0.0, API v2.0.0, Component v1.10.3)
> MCA pubsub: pmi (MCA v2.0.0, API v2.0.0, Component v1.10.3)
>
>
> Any suggestions?
> _______________
> Gedaliah Wolosh
> IST Academic and Research Computing Systems (ARCS)
> NJIT
> GITC 2203
> 973 596 5437 <(973)%20596-5437>
> gwolosh at njit.edu
>
>
--
С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171207/245ae8e9/attachment.html>
More information about the slurm-users
mailing list