[slurm-users] problems with slurm and openmpi

Riccardo Veraldi riccardo.veraldi at gmail.com
Fri Mar 15 03:24:21 UTC 2019


thanks to all.
the problem is that slurm's configure is not able to find the pmix includes

configure:20846: checking for pmix installation
configure:21005: result:
configure:21021: WARNING: unable to locate pmix installation

regardless of the path I give.
and the reason is that configure searches for the following includes:

test -f "$d/include/pmix/pmix_common.h"
test -f "$d/include/pmix_server.h"

but neither of the two are installed by openmpi.

one of the two is in the openmpi soure code tarball

./opal/mca/pmix/pmix3x/pmix/include/pmix_server.h

the other one is in a ".h.in" file. and not ".h"

./opal/mca/pmix/pmix3x/pmix/include/pmix_common.h.in

anyway they do not get installed by the rpm.

the last thing I can try is build directly openmpi from sources and give 
up with the rpm package build. The openmpi .spec has also errors which I 
had to fix manually to allow it to successfully build



On 3/12/19 4:56 PM, Daniel Letai wrote:
> Hi.
> On 12/03/2019 22:53:36, Riccardo Veraldi wrote:
>> Hello,
>> after trynig hard for over 10 days I am forced to write to the list.
>> I am not able to have SLURM work with openmpi. Openmpi compiled 
>> binaries won't run on slurm, while all non openmpi progs run just 
>> fine under "srun". I am using SLURM 18.08.5 building the rpm from the 
>> tarball: rpmbuild -ta slurm-18.08.5-2.tar.bz2
>> prior to bulid SLURM I installed openmpi 4.0.0 which has built in 
>> pmix support. the pmix libraries are in /usr/lib64/pmix/ which is the 
>> default installation path.
>>
>> The problem is that hellompi is not working if I launch in from srun. 
>> of course it runs outside slurm.
>>
>> [psanagpu105:10995] OPAL ERROR: Not initialized in file 
>> pmix3x_client.c at line 113
>> --------------------------------------------------------------------------
>> The application appears to have been direct launched using "srun",
>> but OMPI was not built with SLURM's PMI support and therefore cannot
>> execute. There are several options for building PMI support under
>
> I would guess (but having the config.log files would verify it) that 
> you should rebuild Slurm --with-pmix and then you should rebuild 
> OpenMPI --with Slurm.
>
> Currently there might be a bug in Slurm's configure file building PMIx 
> support without path, so you might either modify the spec before 
> building (add --with-pmix=/usr to the configure section) or for 
> testing purposes ./configure --with-pmix=/usr; make; make install.
>
>
> It seems your current configuration has built-in mismatch - Slurm only 
> supports pmi2, while OpenMPI only supports PMIx. you should build with 
> at least one common PMI: either external PMIx when building  Slurm, or 
> Slurm's PMI2 when building OpenMPI.
>
> However, I would have expected the non-PMI option (srun --mpi=openmpi) 
> to work even in your env, and Slurm should have built PMIx support 
> automatically since it's in default search path.
>
>
>> SLURM, depending upon the SLURM version you are using:
>>
>>   version 16.05 or later: you can use SLURM's PMIx support. This
>>   requires that you configure and build SLURM --with-pmix.
>>
>>   Versions earlier than 16.05: you must use either SLURM's PMI-1 or
>>   PMI-2 support. SLURM builds PMI-1 by default, or you can manually
>>   install PMI-2. You must then build Open MPI using --with-pmi pointing
>>   to the SLURM PMI library location.
>>
>> Please configure as appropriate and try again.
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** on a NULL communicator
>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>> ***    and potentially your MPI job)
>> [psanagpu105:10995] Local abort before MPI_INIT completed completed 
>> successfully, but am not able to aggregate error messages, and not 
>> able to guarantee that all other processes were killed!
>> srun: error: psanagpu105: task 0: Exited with exit code 1
>>
>> I really have no clue. I even reinstalled openmpi on a specific 
>> different path /opt/openmpi/4.0.0
>> anyway seems like slurm does not know how to fine the MPI libraries 
>> even though they are there and right now in the default path /usr/lib64
>>
>> even using --mpi=pmi2 or --mpi=openmpi does not fix the problem and 
>> the same error message is given to me.
>> srun --mpi=list
>> srun: MPI types are...
>> srun: none
>> srun: openmpi
>> srun: pmi2
>>
>>
>> Any hint how could I fix this problem ?
>> thanks a lot
>>
>> Rick
>>
>>
> -- 
> Regards,
>
> Dani_L.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190314/489d2194/attachment.html>


More information about the slurm-users mailing list