[slurm-users] Heterogeneous job one MPI_COMM_WORLD

Gilles Gouaillardet gilles at rist.or.jp
Tue Oct 9 20:50:19 MDT 2018


Christopher,


This looks like a SLURM issue and Open MPI is (currently) out of the 
picture.


What if you


srun --pack-group=0,1 hostname


Do you get a similar error ?


Cheers,

Gilles

On 10/10/2018 3:07 AM, Christopher Benjamin Coffey wrote:
> Hi,
>
> I have a user trying to setup a heterogeneous job with one MPI_COMM_WORLD with the following:
>
> ==========
> #!/bin/bash
> #SBATCH --job-name=hetero
> #SBATCH --output=/scratch/cbc/hetero.txt
> #SBATCH --time=2:00
> #SBATCH --workdir=/scratch/cbc
> #SBATCH --cpus-per-task=1 --mem-per-cpu=2g --ntasks=1 -C sb
> #SBATCH packjob
> #SBATCH --cpus-per-task=1 --mem-per-cpu=1g  --ntasks=1 -C sl
> #SBATCH --mail-type=START,END
>
> module load openmpi/3.1.2-gcc-6.2.0
>
> srun --pack-group=0,1 ~/hellompi 
> ===========
>
>
> Yet, we get an error: " srun: fatal: Job steps that span multiple components of a heterogeneous job are not currently supported". But the docs seem to indicate it should work?
>
> IMPORTANT: The ability to execute a single application across more than one job allocation does not work with all MPI implementations or Slurm MPI plugins. Slurm's ability to execute such an application can be disabled on the entire cluster by adding "disable_hetero_steps" to Slurm's SchedulerParameters configuration parameter.
>
> By default, the applications launched by a single execution of the srun command (even for different components of the heterogeneous job) are combined into one MPI_COMM_WORLD with non-overlapping task IDs.
>
> Does this not work with openmpi? If not, which mpi/slurm config will work? We have slurm.conf MpiDefault=pmi2 currently. I've tried a modern openmpi, and also mpich, and mvapich2.
>
> Any help would be appreciated, thanks!
>
> Best,
> Chris
>
>> Christopher Coffey
> High-Performance Computing
> Northern Arizona University
> 928-523-1167
>   
>




More information about the slurm-users mailing list