[slurm-users] [EXT] Re: pmix issue
Andy Riebs
andy at candooz.com
Mon Dec 7 14:56:18 UTC 2020
Hi Phil,
From a distance, it feels like there may be a mismatch in Slurm
versions (an auxiliary build hiding out somewhere?). You might try
something like
$ which srun; srun which srun
Just to confirm that both the submit and execute nodes are running the
same slurm instance.
Andy
On 12/7/2020 9:19 AM, Yuengling, Philip J. wrote:
>
> Thanks Andy,
>
> Slurm was compiled with --with-pmix=/share/local/pmix-3.2.1. The build
> of pmix isinstalled under /share/local/pmix-3.2.1 which is an NFS
> share across all the nodes. I should also note I used devtoolset-10
> (gcc 10) on RHEL7 and confirmed that everything was compiled with that
> version of compiler.
>
> I also set LD_LIBRARY_PATH to include /share/local/pmix-3.2.1
>
> Cheers!
>
> Phil
>
> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf
> of Andy Riebs <andy at candooz.com>
> *Reply-To: *"andy at candooz.com" <andy at candooz.com>, Slurm User
> Community List <slurm-users at lists.schedmd.com>
> *Date: *Friday, December 4, 2020 at 3:07 PM
> *To: *"slurm-users at lists.schedmd.com" <slurm-users at lists.schedmd.com>
> *Subject: *[EXT] Re: [slurm-users] pmix issue
>
> *APL external email warning: *Verify sender
> slurm-users-bounces at lists.schedmd.com before clicking links or attachments
>
> Also, Slurm was built with "/fs/local/pmix-3.2.1" -- does that
> translate well to "/share/local/pmix-3.2.1"?
>
> Andy
>
> On 12/4/2020 2:59 PM, Andy Riebs wrote:
>
> Are you sure that /share/local/pmix-3.2.1 exists on the compute nodes?
>
> On 12/4/2020 2:54 PM, Yuengling, Philip J. wrote:
>
> Hi everyone,
>
> I’ve been having difficulty getting the --mpi=pmix_v3 option
> to work for me. I can get --mpi=pmi2 to work ok, but I really
> want to understand what I’m doing wrong here. Everything
> seems to build ok.
>
> $ srun --mpi=list
>
> srun: MPI types are...
>
> srun: pmix
>
> srun: pmix_v3
>
> srun: cray_shasta
>
> srun: none
>
> srun: pmi2
>
> $ srun --mpi=pmix_v3 -N5 date
>
> srun: error: task 1 launch failed: Invalid MPI plugin name
>
> srun: error: task 2 launch failed: Invalid MPI plugin name
>
> srun: error: task 3 launch failed: Invalid MPI plugin name
>
> srun: error: task 4 launch failed: Invalid MPI plugin name
>
> srun: error: task 0 launch failed: Invalid MPI plugin name
>
> $ srun --mpi=pmi2 -N5 date
>
> Fri Dec 4 13:52:39 EST 2020
>
> Fri Dec 4 13:52:39 EST 2020
>
> Fri Dec 4 13:52:39 EST 2020
>
> Fri Dec 4 13:52:39 EST 2020
>
> Fri Dec 4 13:52:39 EST 2020
>
> openpmix:
>
> CC=/opt/rh/devtoolset-10/root/usr/bin/gcc ./configure
> --prefix=/share/local/pmix-3.2.1
> --with-hwloc=/share/local/hwloc-2.4.0
>
> Slurm 20.11.0:
>
> rpmbuild --define "_with_pmix
> --with-pmix=/fs/local/pmix-3.2.1" -ta slurm-20.11.0.tar.bz2
>
> From config.log:
>
> ./configure --build=x86_64-redhat-linux-gnu
> --host=x86_64-redhat-linux-gnu --program-prefix=
> --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
> --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc/slurm
> --datadir=/usr/share --includedir=/usr/include
> --libdir=/usr/lib64 --libexecdir=/usr/libexec
> --localstatedir=/var --sharedstatedir=/var/lib
> --mandir=/usr/share/man --infodir=/usr/share/info
> --with-pmix=/fs/local/pmix-3.2.1 --disable-slurmrestd
>
> Open MP 4.0.5:
>
> ./configure '--prefix=/share/openmpi-4.0.5' '--with-cuda'
> '--with-pmix=/share/local/pmix-3.2.1' '--with-pmi=/usr'
> '--with-slurm' '--without-ucx' '--without-verbs'
>
> --
>
> Philip J. Yuengling
>
> Johns Hopkins University
>
> -->
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201207/9daef4d4/attachment.htm>
More information about the slurm-users
mailing list