[slurm-users] slurm does not pass mca params to openmpi?

Roger Mason rmason at mun.ca
Thu Jul 19 06:04:16 MDT 2018


Hello,

I've run into a problem passing MCA parameters to openmpi2.  This runs
fine on the command-line:

/usr/local/mpi/openmpi2/bin/mpirun --mca btl_tcp_if_include \
192.168.0.0/24 -np 10 -hostfile ~/ompi.hosts \
~/Software/Gulp/gulp-5.0/gulp.ompi example2

If I put the the MCA parameters in ~/openmpi/mca-params.conf:
btl_tcp_if_include=192.168.0.0/24

then this command-line also works:

/usr/local/mpi/openmpi2/bin/mpirun -np 10 -hostfile ~/ompi.hosts \
~/Software/Gulp/gulp-5.0/gulp.ompi example2

When I run this batch file (e2.sh):
#!/bin/sh

#SBATCH --output e2_%j.out
#SBATCH --export=ALL

# The mca parameter stops OpenMPI getting confused about which interface to use.
#/usr/local/mpi/openmpi2/bin/mpirun --mca btl_tcp_if_include 192.168.0.0/24 ~/Software/Gulp/gulp-5.0/gulp.ompi example2
/usr/local/mpi/openmpi2/bin/mpirun ~/Software/Gulp/gulp-5.0/gulp.ompi example2

like this:

sbatch -N 2 e2.sh

The .out file contains:
--------------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
--------------------------------------------------------------------------

I'm using openmpi 2.1.3 and slurm-wlm-17.02.10 on FreeBSD.

Any help will be much appreciated.

Roger




More information about the slurm-users mailing list