[slurm-users] Heterogeneous job one MPI_COMM_WORLD
Christopher Benjamin Coffey
Chris.Coffey at nau.edu
Tue Oct 9 12:07:33 MDT 2018
Hi,
I have a user trying to setup a heterogeneous job with one MPI_COMM_WORLD with the following:
==========
#!/bin/bash
#SBATCH --job-name=hetero
#SBATCH --output=/scratch/cbc/hetero.txt
#SBATCH --time=2:00
#SBATCH --workdir=/scratch/cbc
#SBATCH --cpus-per-task=1 --mem-per-cpu=2g --ntasks=1 -C sb
#SBATCH packjob
#SBATCH --cpus-per-task=1 --mem-per-cpu=1g --ntasks=1 -C sl
#SBATCH --mail-type=START,END
module load openmpi/3.1.2-gcc-6.2.0
srun --pack-group=0,1 ~/hellompi
===========
Yet, we get an error: " srun: fatal: Job steps that span multiple components of a heterogeneous job are not currently supported". But the docs seem to indicate it should work?
IMPORTANT: The ability to execute a single application across more than one job allocation does not work with all MPI implementations or Slurm MPI plugins. Slurm's ability to execute such an application can be disabled on the entire cluster by adding "disable_hetero_steps" to Slurm's SchedulerParameters configuration parameter.
By default, the applications launched by a single execution of the srun command (even for different components of the heterogeneous job) are combined into one MPI_COMM_WORLD with non-overlapping task IDs.
Does this not work with openmpi? If not, which mpi/slurm config will work? We have slurm.conf MpiDefault=pmi2 currently. I've tried a modern openmpi, and also mpich, and mvapich2.
Any help would be appreciated, thanks!
Best,
Chris
—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
More information about the slurm-users
mailing list