[slurm-users] problems with slurm and openmpi

Riccardo Veraldi riccardo.veraldi at gmail.com
Fri Mar 15 04:34:20 UTC 2019


Hello,
I installed openmpi from sources and all hte libraries and proper 
include files where installed correctly in /opt/openmpi/4.0.0
as I prefer it in a directory that can I export via NFS rather than the 
default /usr/local

Anyway slurm's configure still complains and it is not happy

./configure --with-pmix=/opt/openmpi/4.0.0/


configure:20846: checking for pmix installation
configure:20881: gcc -o conftest -g -O2 -pthread 
-I/opt/openmpi/4.0.0//include    conftest.c -L/opt/openmpi/4.0.0//lib 
-lpmix   >&5
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to 
`opal_libevent2022_evthread_use_pthreads'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to 
`opal_libevent2022_event_base_loop'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to 
`opal_libevent2022_event_add'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to 
`opal_libevent2022_event_base_free'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to 
`opal_libevent2022_event_active'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to 
`opal_libevent2022_event_base_loopbreak'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to 
`opal_libevent2022_event_del'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to 
`opal_libevent2022_event_assign'
/opt/openmpi/4.0.0//lib/libpmix.so: undefined reference to 
`opal_libevent2022_event_base_new'
collect2: error: ld returned 1 exit status
configure:20881: $? = 1

I did set LD_LIBRARY_PATH

echo $LD_LIBRARY_PATH
/opt/openmpi/4.0.0/lib

any hints ?

thank you very much

Rick


On 3/12/19 5:19 PM, Gilles Gouaillardet wrote:
> Rick,
>
>
> The issue is SLURM can only provide pmi2 support, and it seems Open 
> MPI only supports pmix
>
>
> One option is to rebuild SLURM with PMIx as explained by Daniel, and then
>
> srun --mpi=pmix ...
>
>
> If you do not want (or cannot) rebuilt SLURM, you can use the older 
> pmi or pmi2.
>
> In that case, you have to rebuild Open MPI and pass --with-pmi to the 
> configure command line
>
>
> and then
>
> srun --mpi=pmi2 ...
>
> (or srun --mpi=pmi ...)
>
>
> Finally, you can
>
> scontrol show config | grep MpiDefault
>
>
> and have your sysadmin update this so a simple
>
> srun ....
>
> will run without any --mpi=... parameter
>
>
> Cheers,
>
>
> Gilles
>
> On 3/13/2019 5:53 AM, Riccardo Veraldi wrote:
>> Hello,
>> after trynig hard for over 10 days I am forced to write to the list.
>> I am not able to have SLURM work with openmpi. Openmpi compiled 
>> binaries won't run on slurm, while all non openmpi progs run just 
>> fine under "srun". I am using SLURM 18.08.5 building the rpm from the 
>> tarball: rpmbuild -ta slurm-18.08.5-2.tar.bz2
>> prior to bulid SLURM I installed openmpi 4.0.0 which has built in 
>> pmix support. the pmix libraries are in /usr/lib64/pmix/ which is the 
>> default installation path.
>>
>> The problem is that hellompi is not working if I launch in from srun. 
>> of course it runs outside slurm.
>>
>> [psanagpu105:10995] OPAL ERROR: Not initialized in file 
>> pmix3x_client.c at line 113
>> -------------------------------------------------------------------------- 
>>
>> The application appears to have been direct launched using "srun",
>> but OMPI was not built with SLURM's PMI support and therefore cannot
>> execute. There are several options for building PMI support under
>> SLURM, depending upon the SLURM version you are using:
>>
>>   version 16.05 or later: you can use SLURM's PMIx support. This
>>   requires that you configure and build SLURM --with-pmix.
>>
>>   Versions earlier than 16.05: you must use either SLURM's PMI-1 or
>>   PMI-2 support. SLURM builds PMI-1 by default, or you can manually
>>   install PMI-2. You must then build Open MPI using --with-pmi pointing
>>   to the SLURM PMI library location.
>>
>> Please configure as appropriate and try again.
>> -------------------------------------------------------------------------- 
>>
>> *** An error occurred in MPI_Init
>> *** on a NULL communicator
>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>> ***    and potentially your MPI job)
>> [psanagpu105:10995] Local abort before MPI_INIT completed completed 
>> successfully, but am not able to aggregate error messages, and not 
>> able to guarantee that all other processes were killed!
>> srun: error: psanagpu105: task 0: Exited with exit code 1
>>
>> I really have no clue. I even reinstalled openmpi on a specific 
>> different path /opt/openmpi/4.0.0
>> anyway seems like slurm does not know how to fine the MPI libraries 
>> even though they are there and right now in the default path /usr/lib64
>>
>> even using --mpi=pmi2 or --mpi=openmpi does not fix the problem and 
>> the same error message is given to me.
>> srun --mpi=list
>> srun: MPI types are...
>> srun: none
>> srun: openmpi
>> srun: pmi2
>>
>>
>> Any hint how could I fix this problem ?
>> thanks a lot
>>
>> Rick
>>
>>
>




More information about the slurm-users mailing list