[slurm-users] Error when running srun: error: task X launch failed: Invalid MPI plugin name

Josep Guerrero guerrero at ice.cat
Mon Apr 27 10:26:56 UTC 2020


Dear all,

I'm trying to install slurm, for the first time, as a queue managing system in 
a computing cluster. All of the nodes are using Debian 10, and for OpenMPI I'm 
using the distribution packages (openmpi 3.1.3):

===============
$ ompi_info
                 Package: Debian OpenMPI
                Open MPI: 3.1.3
  Open MPI repo revision: v3.1.3
   Open MPI release date: Oct 29, 2018
                Open RTE: 3.1.3
  Open RTE repo revision: v3.1.3
   Open RTE release date: Oct 29, 2018
                    OPAL: 3.1.3
      OPAL repo revision: v3.1.3
       OPAL release date: Oct 29, 2018
                 MPI API: 3.1.0
            Ident string: 3.1.3
                  Prefix: /usr
 Configured architecture: x86_64-pc-linux-gnu
          Configure host: x86-ubc-01
           Configured by: buildd
           Configured on: Tue Apr 30 22:24:39 UTC 2019
          Configure host: x86-ubc-01
  Configure command line: '--build=x86_64-linux-gnu' '--prefix=/usr'
                          '--includedir=${prefix}/include'
                          '--mandir=${prefix}/share/man'
                          '--infodir=${prefix}/share/info'
                          '--sysconfdir=/etc' '--localstatedir=/var'
                          '--disable-silent-rules'
                          '--libdir=${prefix}/lib/x86_64-linux-gnu'
                          '--libexecdir=${prefix}/lib/x86_64-linux-gnu'
                          '--runstatedir=/run' '--disable-maintainer-mode'
                          '--disable-dependency-tracking'
                          '--disable-silent-rules'
                          '--disable-wrapper-runpath'
                          '--with-package-string=Debian OpenMPI'
                          '--with-verbs' '--with-libfabric' '--with-psm2'
                          '--with-jdk-dir=/usr/lib/jvm/default-java'
                          '--enable-mpi-java'
                          '--enable-opal-btl-usnic-unit-tests'
                          '--with-libevent=external'
                          '--with-pmix=/usr/lib/x86_64-linux-gnu/pmix'
                          '--disable-silent-rules' '--enable-mpi-cxx'
                          '--with-hwloc=/usr' '--with-libltdl'
                          '--with-devel-headers' '--with-slurm' '--with-sge'
                          '--without-tm' '--sysconfdir=/etc/openmpi'
                          '--libdir=${prefix}/lib/x86_64-linux-gnu/openmpi/lib'
                          '--includedir=${prefix}/lib/x86_64-linux-gnu/openmpi/
include'
                Built by: buildd
                Built on: Tue Apr 30 22:44:58 UTC 2019
              Built host: x86-ubc-01
              C bindings: yes
            C++ bindings: yes
             Fort mpif.h: yes (all)
            Fort use mpi: yes (full: ignore TKR)
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: yes
 Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
                          limitations in the gfortran compiler and/or Open
                          MPI, does not support the following: array
                          subsections, direct passthru (where possible) to
                          underlying Open MPI's C functionality
  Fort mpi_f08 subarrays: no
           Java bindings: yes
  Wrapper compiler rpath: rpath
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
  C compiler family name: GNU
      C compiler version: 8.3.0
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
           Fort compiler: gfortran
       Fort compiler abs: /usr/bin/gfortran
         Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
   Fort 08 assumed shape: yes
      Fort optional args: yes
          Fort INTERFACE: yes
    Fort ISO_FORTRAN_ENV: yes
       Fort STORAGE_SIZE: yes
      Fort BIND(C) (all): yes
      Fort ISO_C_BINDING: yes
 Fort SUBROUTINE BIND(C): yes
       Fort TYPE,BIND(C): yes
 Fort T,BIND(C,name="a"): yes
            Fort PRIVATE: yes
          Fort PROTECTED: yes
           Fort ABSTRACT: yes
       Fort ASYNCHRONOUS: yes
          Fort PROCEDURE: yes
         Fort USE...ONLY: yes
           Fort C_FUNLOC: yes
 Fort f08 using wrappers: yes
         Fort MPI_SIZEOF: yes
             C profiling: yes
           C++ profiling: yes
   Fort mpif.h profiling: yes
  Fort use mpi profiling: yes
   Fort use mpi_f08 prof: yes
          C++ exceptions: no
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, ORTE progress: yes, Event lib:
                          yes)
           Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
              dl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
          MPI extensions: affinity, cuda
   FT Checkpoint support: no (checkpoint thread: no)
   C/R Enabled Debugging: no
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v3.1.3)
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v3.1.3)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA btl: self (MCA v2.1.0, API v3.0.0, Component v3.1.3)
                 MCA btl: tcp (MCA v2.1.0, API v3.0.0, Component v3.1.3)
                 MCA btl: openib (MCA v2.1.0, API v3.0.0, Component v3.1.3)
                 MCA btl: vader (MCA v2.1.0, API v3.0.0, Component v3.1.3)
            MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v3.1.3)
            MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA crs: none (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v3.1.3)
               MCA event: external (MCA v2.1.0, API v2.0.0, Component v3.1.3)
               MCA hwloc: external (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v3.1.3)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v3.1.3)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v3.1.3)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v3.1.3)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v3.1.3)
               MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v3.1.3)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v3.1.3)
                MCA pmix: ext2x (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v3.1.3)
               MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v3.1.3)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v3.1.3)
           MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v3.1.3)
           MCA reachable: netlink (MCA v2.1.0, API v2.0.0, Component v3.1.3)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v3.1.3)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v3.1.3)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v3.1.3)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA dfs: app (MCA v2.1.0, API v1.0.0, Component v3.1.3)
                 MCA dfs: test (MCA v2.1.0, API v1.0.0, Component v3.1.3)
                 MCA dfs: orted (MCA v2.1.0, API v1.0.0, Component v3.1.3)
              MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
                          v3.1.3)
              MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
                          v3.1.3)
              MCA errmgr: dvm (MCA v2.1.0, API v3.0.0, Component v3.1.3)
              MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
                          v3.1.3)
              MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
                          v3.1.3)
                 MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v3.1.3)
                 MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v3.1.3)
                 MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
                          v3.1.3)
                 MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v3.1.3)
                 MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v3.1.3)
                 MCA ess: env (MCA v2.1.0, API v3.0.0, Component v3.1.3)
               MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v3.1.3)
             MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v3.1.3)
                 MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v3.1.3)
            MCA notifier: syslog (MCA v2.1.0, API v1.0.0, Component v3.1.3)
                MCA odls: default (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA oob: ud (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component
                          v3.1.3)
                 MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
                          v3.1.3)
                 MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                MCA regx: fwd (MCA v2.1.0, API v1.0.0, Component v3.1.3)
                MCA regx: reverse (MCA v2.1.0, API v1.0.0, Component v3.1.3)
               MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v3.1.3)
               MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
                          v3.1.3)
               MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v3.1.3)
               MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v3.1.3)
               MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
                          v3.1.3)
               MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
                          v3.1.3)
                 MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v3.1.3)
                 MCA rml: ofi (MCA v2.1.0, API v3.0.0, Component v3.1.3)
              MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v3.1.3)
              MCA routed: debruijn (MCA v2.1.0, API v3.0.0, Component v3.1.3)
              MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v3.1.3)
              MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v3.1.3)
                 MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v3.1.3)
              MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v3.1.3)
              MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v3.1.3)
              MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v3.1.3)
              MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v3.1.3)
               MCA state: tool (MCA v2.1.0, API v1.0.0, Component v3.1.3)
               MCA state: orted (MCA v2.1.0, API v1.0.0, Component v3.1.3)
               MCA state: app (MCA v2.1.0, API v1.0.0, Component v3.1.3)
               MCA state: dvm (MCA v2.1.0, API v1.0.0, Component v3.1.3)
               MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v3.1.3)
               MCA state: novm (MCA v2.1.0, API v1.0.0, Component v3.1.3)
                 MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                MCA coll: spacc (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                MCA coll: self (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                MCA coll: monitoring (MCA v2.1.0, API v2.0.0, Component
                          v3.1.3)
                MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v3.1.3)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v3.1.3)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v3.1.3)
               MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
                          v3.1.3)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v3.1.3)
               MCA fcoll: static (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                  MCA io: romio314 (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA mtl: psm2 (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA mtl: ofi (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA mtl: psm (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v3.1.3)
                 MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v3.1.3)
                 MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
                          v3.1.3)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v3.1.3)
                 MCA pml: v (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v3.1.3)
                 MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component
                          v3.1.3)
                 MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v3.1.3)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v3.1.3)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v3.1.3)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v3.1.3)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v3.1.3)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v3.1.3)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v3.1.3)
========

according to the options above, OpenMPI seems to be compiled with the option:

 '--with-pmix=/usr/lib/x86_64-linux-gnu/pmix'

but not with "--with-pmi". On the other hand, the slurmd package provided by 
Debian:

===========
Package: slurmd
Source: slurm-llnl
Version: 18.08.5.2-1+deb10u1
Installed-Size: 784
Maintainer: Debian HPC Team <debian-hpc at lists.debian.org>
Architecture: amd64
Depends: libc6 (>= 2.27), libhwloc5 (>= 1.11.12), liblz4-1 (>= 0.0~r130), 
libnuma1 (>= 2.0.11), libpam0g (>= 0.99.7.1), libssh2-1 (>= 1.2.8), zlib1g (>= 
1:1.2.0), openssl (>= 0.9.8g-9), slurm-wlm-basic-plugins (= 
18.08.5.2-1+deb10u1), ucf, munge, lsb-base (>= 3.2-12)
Description-en: SLURM compute node daemon
 SLURM stands for Simple Linux Utility for Resource Management, it
 is an open-source cluster resource management and job scheduling system
 that strives to be simple, scalable, portable, fault-tolerant, and
 interconnect agnostic.
 This package contains the compute node demon slurmd.
Description-md5: c7a70378d04f7a2ac4844c7a91f3e281
Homepage: http://slurm.schedmd.com
Section: admin
Priority: optional
Filename: pool/main/s/slurm-llnl/slurmd_18.08.5.2-1+deb10u1_amd64.deb
Size: 373092
MD5sum: 6ff3f023cda017a30ba84d1f684bb6fb
SHA256: 3736eb356c908e4c3ac35ddbe6661d0ac48f2e88f7a1a47232581725be496e71
===========

==========
/usr/bin/srun --mpi=list
srun: MPI types are...
srun: openmpi
srun: pmi2
srun: none
==========

does not seem to have support for pmix. There seems to be an "openmpi" option, 
but I haven't been able to find documentation on how it is supposed to work. 
So, as I understand the situation, Debian openmpi packages don't have pmi2 
support and Debian slurmd package doesn't have pmix support.

According to:

https://slurm.schedmd.com/mpi_guide.html#open_mpi

=======
"Starting with Open MPI version 3.1, PMIx version 2 is natively supported. To 
launch Open MPI application using PMIx version 2 the '--mpi=pmix_v2' option 
must be specified on the srun command line or 'MpiDefault=pmi_v2' configured in 
slurm.conf. Open MPI version 4.0, adds support for PMIx version 3 and is 
invoked in the same way, with '--mpi=pmix_v3'.

In Open MPI version 2.0, PMIx is natively supported too. To launch Open MPI 
application using PMIx the '--mpi=pmix' or '--mpi=pmix_v1' option has to be 
specified on the srun command line"
========

(I'm assuming 'MpiDefault=pmi_v2' is a typo and should be 
'MpiDefault=pmix_v2', but anyway I'm not using that option for the moment).

but I understand that, for that to work, slurm must have pmix support, so I 
downloaded and compiled the sources for slurm 18.08.9 (I wanted to use a 
version as close as possible to the Debian one, to ensure all system libraries 
would be compatible), this time including pmix support. The new version seems 
to start without problems and now it says it has support for pmix:

==========
$ which srun
/usr/local/slurmd-18.08.9/bin/srun
$ srun --mpi=list
srun: MPI types are...
srun: pmix
srun: pmix_v3
srun: pmi2
srun: openmpi
srun: none
========

There is no pmix_v2 option, but there is a pmix option. I couldn't find a 
configure option for slurm that would force it to try to create the pmix_v2 
version, but the previously cited documentation seems to say that OpenMPI 
should be able to use pmix too. Anyway, if I try to run a very simple program 
compiled with the OpenMPI version of Debian which works if I run it directly 
with mpirun.opempi, I get this error:

===========
$ srun -n 2 -N 2  --mpi=pmix -p multialt a.out
srun: error: task 0 launch failed: Invalid MPI plugin name
srun: error: task 1 launch failed: Invalid MPI plugin name
$
===========

At this point I don't know exactly what is the problem. I searched both Google 
and this user list for this error, but I couldn't find any relevant 
information.  I suspect it may be caused by the lack of the pmix_v2 support 
(maybe pmix_v1 support was dropped on OpenMPI 3?), but then I don't know what 
I should do to create that support during slurm compilation. I thought too 
that maybe the previous slurm installation from Debian packages may be 
interfering in some way, but I used strace on the previous instruction and all 
slurm related file accesses seems to be done on the directories of the version 
I compiled.

So does someone have any suggestion about what I could try?

Thanks for your time and attention. 

Josep Guerrero






More information about the slurm-users mailing list