[slurm-users] [EXT] Re: pmix issue

Andy Riebs andy at candooz.com
Tue Dec 8 13:53:52 UTC 2020


Thanks for the follow-up Phil, both the problem and the way that you 
tracked it down. Another good note for the Slurm Users' Toolkit!

Andy

On 12/7/2020 9:02 PM, Yuengling, Philip J. wrote:
>
> Thanks everyone for your replies!
>
> It turned out to be a library dependency for libevent wasn’t being 
> found as needed.  While I thought I was using a shared-location 
> library, I was not.  I had apparently set up /etc/ld.so.conf.d on the 
> build host to use /usr/local/lib which… has libevent in it.  But none 
> of the other nodes had this.  The problem became apparent after going 
> to each node and running pmix_info.
>
> This means I should remove the ld.so.conf.d entry and rebuild 
> everything against the preferred set of libraries.  Otherwise pmix 
> appears to work as expected now.
>
> Cheers!
>
> Phil
>
> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf 
> of Philip Kovacs <pkdevel at yahoo.com>
> *Reply-To: *Philip Kovacs <pkdevel at yahoo.com>, Slurm User Community 
> List <slurm-users at lists.schedmd.com>
> *Date: *Monday, December 7, 2020 at 10:55 AM
> *To: *"andy at candooz.com" <andy at candooz.com>, Slurm User Community List 
> <slurm-users at lists.schedmd.com>
> *Subject: *Re: [slurm-users] [EXT] Re: pmix issue
>
> *APL external email warning: *Verify sender 
> slurm-users-bounces at lists.schedmd.com before clicking links or attachments
>
> Make sure the .so symlink for the pmix lib is available -- not just 
> the versioned .so, e.g. .so.2.  Slurm requires that .so symlink.  Some 
> distros split packages into base/devel, so you may need to install a 
> pmix-devel package, if available, in order to add the .so symlink 
> (which is considered a "development" file).
>
> On Monday, December 7, 2020, 09:22:06 AM EST, Yuengling, Philip J. 
> <philip.yuengling at jhuapl.edu> wrote:
>
> Thanks Andy,
>
> Slurm was compiled with --with-pmix=/share/local/pmix-3.2.1.  The 
> build of pmix is installed under /share/local/pmix-3.2.1 which is an 
> NFS share across all the nodes.  I should also note I used 
> devtoolset-10 (gcc 10) on RHEL7 and confirmed that everything was 
> compiled with that version of compiler.
>
> I also set LD_LIBRARY_PATH to include /share/local/pmix-3.2.1
>
> Cheers!
>
> Phil
>
> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf 
> of Andy Riebs <andy at candooz.com>
> *Reply-To: *"andy at candooz.com" <andy at candooz.com>, Slurm User 
> Community List <slurm-users at lists.schedmd.com>
> *Date: *Friday, December 4, 2020 at 3:07 PM
> *To: *"slurm-users at lists.schedmd.com" <slurm-users at lists.schedmd.com>
> *Subject: *[EXT] Re: [slurm-users] pmix issue
>
> *APL external email warning: *Verify sender 
> slurm-users-bounces at lists.schedmd.com before clicking links or attachments
>
> Also, Slurm was built with "/fs/local/pmix-3.2.1" -- does that 
> translate well to "/share/local/pmix-3.2.1"?
>
> Andy
>
> On 12/4/2020 2:59 PM, Andy Riebs wrote:
>
>     Are you sure that /share/local/pmix-3.2.1 exists on the compute nodes?
>
>     On 12/4/2020 2:54 PM, Yuengling, Philip J. wrote:
>
>         Hi everyone,
>
>         I’ve been having difficulty getting the --mpi=pmix_v3 option
>         to work for me.  I can get --mpi=pmi2 to work ok, but I really
>         want to understand what I’m doing wrong here.  Everything
>         seems to build ok.
>
>         $ srun --mpi=list
>
>         srun: MPI types are...
>
>         srun: pmix
>
>         srun: pmix_v3
>
>         srun: cray_shasta
>
>         srun: none
>
>         srun: pmi2
>
>         $ srun --mpi=pmix_v3 -N5 date
>
>         srun: error: task 1 launch failed: Invalid MPI plugin name
>
>         srun: error: task 2 launch failed: Invalid MPI plugin name
>
>         srun: error: task 3 launch failed: Invalid MPI plugin name
>
>         srun: error: task 4 launch failed: Invalid MPI plugin name
>
>         srun: error: task 0 launch failed: Invalid MPI plugin name
>
>         $ srun --mpi=pmi2 -N5 date
>
>         Fri Dec  4 13:52:39 EST 2020
>
>         Fri Dec  4 13:52:39 EST 2020
>
>         Fri Dec  4 13:52:39 EST 2020
>
>         Fri Dec  4 13:52:39 EST 2020
>
>         Fri Dec  4 13:52:39 EST 2020
>
>         openpmix:
>
>         CC=/opt/rh/devtoolset-10/root/usr/bin/gcc ./configure
>         --prefix=/share/local/pmix-3.2.1
>         --with-hwloc=/share/local/hwloc-2.4.0
>
>         Slurm 20.11.0:
>
>         rpmbuild --define "_with_pmix
>         --with-pmix=/fs/local/pmix-3.2.1" -ta slurm-20.11.0.tar.bz2
>
>         From config.log:
>
>         ./configure --build=x86_64-redhat-linux-gnu
>         --host=x86_64-redhat-linux-gnu --program-prefix=
>         --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
>         --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc/slurm
>         --datadir=/usr/share --includedir=/usr/include
>         --libdir=/usr/lib64 --libexecdir=/usr/libexec
>         --localstatedir=/var --sharedstatedir=/var/lib
>         --mandir=/usr/share/man --infodir=/usr/share/info
>         --with-pmix=/fs/local/pmix-3.2.1 --disable-slurmrestd
>
>         Open MP 4.0.5:
>
>         ./configure  '--prefix=/share/openmpi-4.0.5' '--with-cuda'
>         '--with-pmix=/share/local/pmix-3.2.1' '--with-pmi=/usr'
>         '--with-slurm' '--without-ucx' '--without-verbs'
>
>         -- 
>
>         Philip J. Yuengling
>
>         Johns Hopkins University
>
> -->
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201208/359a4a99/attachment-0001.htm>


More information about the slurm-users mailing list