[slurm-users] slurm does not pass mca params to openmpi?
Roger Mason
rmason at mun.ca
Thu Jul 19 06:04:16 MDT 2018
Hello,
I've run into a problem passing MCA parameters to openmpi2. This runs
fine on the command-line:
/usr/local/mpi/openmpi2/bin/mpirun --mca btl_tcp_if_include \
192.168.0.0/24 -np 10 -hostfile ~/ompi.hosts \
~/Software/Gulp/gulp-5.0/gulp.ompi example2
If I put the the MCA parameters in ~/openmpi/mca-params.conf:
btl_tcp_if_include=192.168.0.0/24
then this command-line also works:
/usr/local/mpi/openmpi2/bin/mpirun -np 10 -hostfile ~/ompi.hosts \
~/Software/Gulp/gulp-5.0/gulp.ompi example2
When I run this batch file (e2.sh):
#!/bin/sh
#SBATCH --output e2_%j.out
#SBATCH --export=ALL
# The mca parameter stops OpenMPI getting confused about which interface to use.
#/usr/local/mpi/openmpi2/bin/mpirun --mca btl_tcp_if_include 192.168.0.0/24 ~/Software/Gulp/gulp-5.0/gulp.ompi example2
/usr/local/mpi/openmpi2/bin/mpirun ~/Software/Gulp/gulp-5.0/gulp.ompi example2
like this:
sbatch -N 2 e2.sh
The .out file contains:
--------------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
--------------------------------------------------------------------------
I'm using openmpi 2.1.3 and slurm-wlm-17.02.10 on FreeBSD.
Any help will be much appreciated.
Roger
More information about the slurm-users
mailing list