[slurm-users] Slurm + IntelMPI

Hermann Schwärzler hermann.schwaerzler at uibk.ac.at
Tue Mar 21 16:58:58 UTC 2023


Hi everybody,

in our new cluster we have configured Slurm with

SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory
ProctrackType=proctrack/cgroup
TaskPlugin=task/affinity,task/cgroup

which I think is quite a usual setup.

After installing Intel MPI (using Spack v0.19) we saw that there is a 
serious problem with task distribution when we use its mpirun utility - 
see this output of top when we submit a job of an mpi_hello program to 
one node and 6 tasks:

[...]
  P COMMAND 
 

33  `- slurmstepd: [98749.extern] 
 

34      `- sleep 100000000 
 

32  `- slurmstepd: [98749.batch] 
 

32      `- /bin/bash /var/spool/slurm/slurmd/job98749/slurm_script 
 

33          `- /bin/sh /path/to/mpirun bash -c ./mpi_hello_world; sleep 30
  2              `- mpiexec.hydra bash -c ./mpi_hello_world; sleep 30 
 

  1                  `- /usr/slurm/bin/srun -N 1 -n 1 --ntasks-per-node 
1 --nodelist n054 --input none /path/to/hydra_bstrap_proxy ...
  0                      `- /usr/slurm/bin/srun -N 1 -n 1 
--ntasks-per-node 1 --nodelist n054 --input none 
/path/to/hydra_bstrap_proxy ...
32  `- slurmstepd: [98749.0] 
 

  0      `- /path/to/hydra_pmi_proxy --usize -1 --auto-cleanup 1 
--abort-signal 9
  0          `- bash -c ./mpi_hello_world; sleep 30 
 

  0              `- sleep 30 
 

  0          `- bash -c ./mpi_hello_world; sleep 30 
 

  0              `- sleep 30 
 

  0          `- bash -c ./mpi_hello_world; sleep 30 
 

  0              `- sleep 30 
 

  0          `- bash -c ./mpi_hello_world; sleep 30 
 

  0              `- sleep 30 
 

  0          `- bash -c ./mpi_hello_world; sleep 30 
 

  0              `- sleep 30 
 

  0          `- bash -c ./mpi_hello_world; sleep 30 
 

  0              `- sleep 30 
 


You see: mpirun starts mpiexec.hydra which starts srun (with options "-N 
1 -n 1") to start hydra_bstrap_proxy. This of course starts a new 
job-step where hydra_bstrap_proxy runs hydra_pmi_proxy to finally start 
our six instances of the desired program.

The problem in our setup is: that srun only asks for one task 
explicitly! So its job-step gets constrained to one task (and one CPU). 
And so *all six tasks run on one single CPU* (see "P" column of top). :-(

I found documentation on the internet where others seemed to have had 
similar problems and are recommending to their users to use srun instead 
of mpirun with Intel MPI.

Is this really the only "solution" to this problem?
Or are there other ones?

Regards,
Hermann



More information about the slurm-users mailing list