[slurm-users] problems with slurm and openmpi

Riccardo Veraldi riccardo.veraldi at gmail.com
Fri Mar 15 05:12:13 UTC 2019


     
 

 I missed this step then to build pmix separately. I thought that the built in pmix inside openpmi could be used by slurm  
 

 
 

 
 
>  
> On Mar 14, 2019 at 9:32 PM,  <Gilles Gouaillardet (mailto:gilles at rist.or.jp)>  wrote:
>  
>  
>  
>  Riccardo, 
>
>
> I am a bit confused by your explanation. 
>
>
> Open MPI does embed PMIx, but only for itself. 
>
> An other way to put it is you have to install pmix first (package or 
> download from pmix.org) 
>
> and then build SLURM on top of it. 
>
>
> Then you can build Open MPI with the same (external) PMIx or the 
> embedded one 
>
> (since PMIx offers cross-version compatilibity support) 
>
>
> Cheers, 
>
>
> Gilles 
>
>
> On 3/15/2019 12:24 PM, Riccardo Veraldi wrote: 
> >  thanks to all. 
> >  the problem is that slurm's configure is not able to find the pmix 
> >  includes 
> >  
> >  configure:20846: checking for pmix installation 
> >  configure:21005: result: 
> >  configure:21021: WARNING: unable to locate pmix installation 
> >  
> >  regardless of the path I give. 
> >  and the reason is that configure searches for the following includes: 
> >  
> >  test -f "$d/include/pmix/pmix_common.h" 
> >  test -f "$d/include/pmix_server.h" 
> >  
> >  but neither of the two are installed by openmpi. 
> >  
> >  one of the two is in the openmpi soure code tarball 
> >  
> >  ./opal/mca/pmix/pmix3x/pmix/include/pmix_server.h 
> >  
> >  the other one is in a ".h.in" file. and not ".h" 
> >  
> >  ./opal/mca/pmix/pmix3x/pmix/include/pmix_common.h.in 
> >  
> >  anyway they do not get installed by the rpm. 
> >  
> >  the last thing I can try is build directly openmpi from sources and 
> >  give up with the rpm package build. The openmpi .spec has also errors 
> >  which I had to fix manually to allow it to successfully build 
> >  
> >  
> >  
> >  On 3/12/19 4:56 PM, Daniel Letai wrote: 
> >>  Hi. 
> >>  On 12/03/2019 22:53:36, Riccardo Veraldi wrote: 
> >>>  Hello, 
> >>>  after trynig hard for over 10 days I am forced to write to the list. 
> >>>  I am not able to have SLURM work with openmpi. Openmpi compiled 
> >>>  binaries won't run on slurm, while all non openmpi progs run just 
> >>>  fine under "srun". I am using SLURM 18.08.5 building the rpm from 
> >>>  the tarball: rpmbuild -ta slurm-18.08.5-2.tar.bz2 
> >>>  prior to bulid SLURM I installed openmpi 4.0.0 which has built in 
> >>>  pmix support. the pmix libraries are in /usr/lib64/pmix/ which is 
> >>>  the default installation path. 
> >>>  
> >>>  The problem is that hellompi is not working if I launch in from 
> >>>  srun. of course it runs outside slurm. 
> >>>  
> >>>  [psanagpu105:10995] OPAL ERROR: Not initialized in file 
> >>>  pmix3x_client.c at line 113 
> >>>  -------------------------------------------------------------------------- 
> >>>  The application appears to have been direct launched using "srun", 
> >>>  but OMPI was not built with SLURM's PMI support and therefore cannot 
> >>>  execute. There are several options for building PMI support under 
> >>  
> >>  I would guess (but having the config.log files would verify it) that 
> >>  you should rebuild Slurm --with-pmix and then you should rebuild 
> >>  OpenMPI --with Slurm. 
> >>  
> >>  Currently there might be a bug in Slurm's configure file building 
> >>  PMIx support without path, so you might either modify the spec before 
> >>  building (add --with-pmix=/usr to the configure section) or for 
> >>  testing purposes ./configure --with-pmix=/usr; make; make install. 
> >>  
> >>  
> >>  It seems your current configuration has built-in mismatch - Slurm 
> >>  only supports pmi2, while OpenMPI only supports PMIx. you should 
> >>  build with at least one common PMI: either external PMIx when 
> >>  building Slurm, or Slurm's PMI2 when building OpenMPI. 
> >>  
> >>  However, I would have expected the non-PMI option (srun 
> >>  --mpi=openmpi) to work even in your env, and Slurm should have built 
> >>  PMIx support automatically since it's in default search path. 
> >>  
> >>  
> >>>  SLURM, depending upon the SLURM version you are using: 
> >>>  
> >>>  version 16.05 or later: you can use SLURM's PMIx support. This 
> >>>  requires that you configure and build SLURM --with-pmix. 
> >>>  
> >>>  Versions earlier than 16.05: you must use either SLURM's PMI-1 or 
> >>>  PMI-2 support. SLURM builds PMI-1 by default, or you can manually 
> >>>  install PMI-2. You must then build Open MPI using --with-pmi pointing 
> >>>  to the SLURM PMI library location. 
> >>>  
> >>>  Please configure as appropriate and try again. 
> >>>  -------------------------------------------------------------------------- 
> >>>  *** An error occurred in MPI_Init 
> >>>  *** on a NULL communicator 
> >>>  *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, 
> >>>  *** and potentially your MPI job) 
> >>>  [psanagpu105:10995] Local abort before MPI_INIT completed completed 
> >>>  successfully, but am not able to aggregate error messages, and not 
> >>>  able to guarantee that all other processes were killed! 
> >>>  srun: error: psanagpu105: task 0: Exited with exit code 1 
> >>>  
> >>>  I really have no clue. I even reinstalled openmpi on a specific 
> >>>  different path /opt/openmpi/4.0.0 
> >>>  anyway seems like slurm does not know how to fine the MPI libraries 
> >>>  even though they are there and right now in the default path /usr/lib64 
> >>>  
> >>>  even using --mpi=pmi2 or --mpi=openmpi does not fix the problem and 
> >>>  the same error message is given to me. 
> >>>  srun --mpi=list 
> >>>  srun: MPI types are... 
> >>>  srun: none 
> >>>  srun: openmpi 
> >>>  srun: pmi2 
> >>>  
> >>>  
> >>>  Any hint how could I fix this problem ? 
> >>>  thanks a lot 
> >>>  
> >>>  Rick 
> >>>  
> >>>  
> >>  -- 
> >>  Regards, 
> >>  
> >>  Dani_L. 
> >  
> >  
>
>              
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190314/71f47ffa/attachment.html>


More information about the slurm-users mailing list