[slurm-users] Heterogeneous job one MPI_COMM_WORLD
Pritchard Jr., Howard
howardp at lanl.gov
Wed Oct 10 07:58:21 MDT 2018
We hit some problems at LANL trying to use this SLURm feature.
At the time, I think SchedMD said there would need to be fixes
to the SLURM PMI2 library to get this to work.
What version of SLURM are you using?
Office 9, 2nd floor Research Park
TA-03, Building 4200, Room 203
Los Alamos National Laboratory
On 10/9/18, 8:50 PM, "slurm-users on behalf of Gilles Gouaillardet"
<slurm-users-bounces at lists.schedmd.com on behalf of gilles at rist.or.jp>
>This looks like a SLURM issue and Open MPI is (currently) out of the
>What if you
>srun --pack-group=0,1 hostname
>Do you get a similar error ?
>On 10/10/2018 3:07 AM, Christopher Benjamin Coffey wrote:
>> I have a user trying to setup a heterogeneous job with one
>>MPI_COMM_WORLD with the following:
>> #SBATCH --job-name=hetero
>> #SBATCH --output=/scratch/cbc/hetero.txt
>> #SBATCH --time=2:00
>> #SBATCH --workdir=/scratch/cbc
>> #SBATCH --cpus-per-task=1 --mem-per-cpu=2g --ntasks=1 -C sb
>> #SBATCH packjob
>> #SBATCH --cpus-per-task=1 --mem-per-cpu=1g --ntasks=1 -C sl
>> #SBATCH --mail-type=START,END
>> module load openmpi/3.1.2-gcc-6.2.0
>> srun --pack-group=0,1 ~/hellompi
>> Yet, we get an error: " srun: fatal: Job steps that span multiple
>>components of a heterogeneous job are not currently supported". But the
>>docs seem to indicate it should work?
>> IMPORTANT: The ability to execute a single application across more than
>>one job allocation does not work with all MPI implementations or Slurm
>>MPI plugins. Slurm's ability to execute such an application can be
>>disabled on the entire cluster by adding "disable_hetero_steps" to
>>Slurm's SchedulerParameters configuration parameter.
>> By default, the applications launched by a single execution of the srun
>>command (even for different components of the heterogeneous job) are
>>combined into one MPI_COMM_WORLD with non-overlapping task IDs.
>> Does this not work with openmpi? If not, which mpi/slurm config will
>>work? We have slurm.conf MpiDefault=pmi2 currently. I've tried a modern
>>openmpi, and also mpich, and mvapich2.
>> Any help would be appreciated, thanks!
>> Christopher Coffey
>> High-Performance Computing
>> Northern Arizona University
More information about the slurm-users