[slurm-users] [EXTERNAL] OpenMPI and Slurm clarification?
Craig
cfreese at super.org
Tue Mar 28 14:33:32 UTC 2023
Ok, thanks. "Coordinating" with sys admins is problematic so I guess
I'll just continue with the internal pmix and keep an eye out for problems.
At least I know I'm not doing anything blatantly stupid.
On 3/27/23 20:46, Pritchard Jr., Howard wrote:
>
> HI Craig,
>
> Its not essential to use the pmix lib used to build the SLURM pmix
> plugin but it does reduce likelihood of problems.
>
> I don’t know how, but there is some way that the admin installing
> SLURM can “name” the available pmix –mpi options.
>
> For instance on one of our systems, the admin has built multiple
> variants of the pmix plugin:
>
> MPI plugin types are...
>
> cray_shasta
>
> none
>
> pmi2
>
> pmix
>
> specific pmix plugin versions available:
> pmix_v2,pmix_v3,pmix_v314,pmix_v4,pmix_v422
>
> This naming convention has helped us with “decoupling” building of
> Open MPI from SLURM build, but does mean some coordination with the
> sys admins.
>
> We’re using SLURM 22.05.6
>
> Hope this helps,
>
> Howard
>
> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf
> of Craig <cfreese at super.org>
> *Reply-To: *Slurm User Community List <slurm-users at lists.schedmd.com>
> *Date: *Monday, March 27, 2023 at 2:01 PM
> *To: *"slurm-users at lists.schedmd.com" <slurm-users at lists.schedmd.com>
> *Subject: *Re: [slurm-users] [EXTERNAL] OpenMPI and Slurm clarification?
>
> conf.log...
>
> checking if user requested PMI support
>
> result: no
>
> checking if user requested internal PMIx support(yes)
>
> result: no
>
> checking for pmix.h in /usr
>
> result: not found
>
> checking for pmix.h in /usr/include
>
> result: not found
>
> WARNING: discovered external PMIx version is less than internal
> version 3.x
>
> WARNING: using internal PMIx
>
> So is looks like it used the internal version (which is what I was
> aiming for) and that's ok by me since it seems to be working, but if
> I'm really supposed to be using the same one that SLURM used then I'm
> gonna have to figure out a way to determine what that was/is.
>
> On 3/27/23 15:28, Pritchard Jr., Howard wrote:
>
> HI Craig,
>
> Your use of the –with-pmix on the open mpi configure line is
> important. Without any args to this configure option open mpi
> configure will first check if there’s an external pmix which is
> newer than the one that is included in the openmpi release
> tarball. If it is not, the internal pmix is built.
>
> You can check in the config.log whether the internal PMix or an
> external one was used.
>
> If you want to be extra careful, find the location of the PMIx v3
> used to build the SLURM PMIx plugin, and then rebuild your open
> mpi 4.1.5 with
>
> ./configure …
> --with-pmix=path_to_pmix_used_for_slurm_pmix_plugin_build ….
>
> But you may be okay without doing this. You can check this by
> running your open mpi job with
>
> srun –mpi=pmix_v3 -N2 foo
>
> and see if it behaves as expected.
>
> I’m not sure what the “openmpi” result from srun –mpi=list is about.
>
> Howard
>
> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com>
> <mailto:slurm-users-bounces at lists.schedmd.com>on behalf of Craig
> <cfreese at super.org> <mailto:cfreese at super.org>
> *Reply-To: *Slurm User Community List
> <slurm-users at lists.schedmd.com> <mailto:slurm-users at lists.schedmd.com>
> *Date: *Monday, March 27, 2023 at 12:54 PM
> *To: *"slurm-users at lists.schedmd.com"
> <mailto:slurm-users at lists.schedmd.com><slurm-users at lists.schedmd.com>
> <mailto:slurm-users at lists.schedmd.com>
> *Subject: *Re: [slurm-users] [EXTERNAL] OpenMPI and Slurm
> clarification?
>
> srun: MPI types are...
>
> srun: none
>
> srun: openmpi
>
> srun: pmix_v3
>
> srun: pmi2
>
> srun: pmix
>
> but I'm not sure that tells me much about how I am supposed to be
> building OpenMPI?
>
> On 3/27/23 14:41, Pritchard Jr., Howard wrote:
>
> HI Craig,
>
> If you run
>
> srun –mpi=list
>
> what does slurm report?
>
> That will help in determining what argument you want to supply
> for the –mpi srun option.
>
> Howard
>
> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com>
> <mailto:slurm-users-bounces at lists.schedmd.com>on behalf of
> Craig <cfreese at super.org> <mailto:cfreese at super.org>
> *Reply-To: *Slurm User Community List
> <slurm-users at lists.schedmd.com>
> <mailto:slurm-users at lists.schedmd.com>
> *Date: *Monday, March 27, 2023 at 12:38 PM
> *To: *"slurm-users at lists.schedmd.com"
> <mailto:slurm-users at lists.schedmd.com><slurm-users at lists.schedmd.com>
> <mailto:slurm-users at lists.schedmd.com>
> *Subject: *[EXTERNAL] [slurm-users] OpenMPI and Slurm
> clarification?
>
>
> Can someone please clarify the "best practices" for building
> OpenMPI compatible with Slurm?
>
> https://slurm.schedmd.com/mpi_guide.html#open_mpi
> <https://urldefense.com/v3/__https:/slurm.schedmd.com/mpi_guide.html*open_mpi__;Iw!!Bt8fGhp8LhKGRg!Cb86a2IwxgqfT5fv1_eEByDpAyhly3ZdN6Wwl7Wod9FRPx9HBpvFVojIRgu5oSpti_3jOXhNyvJqEMGs$>
> tells me what I _can_ do but I'm unclear as to what I _should_
> do.
>
> I've built OpenMPI 4.1.5 with: --with-pmix
> --with-libevent=internal --with-hwloc=internal --with-slurm.
> If I run an MPI program on my cluster (slurm 18.08.8) with
> "srun -N2 foo" it seems to work fine. (slurm.conf has
> MpiDefault=pmix).
>
> If I "srun --mpi=openmpi -N2 foo" it chokes with:
>
> OPAL_ERROR: Unreachable in file
> ../../../../../opal/mca/pmix/pmix3/pmix3x_client.c at line 112
> -------------------------------------------------------------------------------------------------------------------
> This application appears to have been direct launched
> using "srun",
> but OMPI was not build with SLURM's PMI support and
> therefore cannot
> execute. There are several options for building PMI
> support under
> SLURM, depending upon the SLURM version you are using:
>
> version 16.05 or later: you can use SLURM's PMIx support. THis
> require that you configure and uild SLURM --with-pmix.
> .
> .
> .
>
>
> So I guess the question is, what is the "right" way to build
> OpenMPI with Slurm. Is the fact that my non-Slurm pmix works
> "correct" or am I just getting lucky that the various software
> I have just happens to be compatible. If I build OpenMPI am I
> supposed to use Slurm's pmix/libevent/hwloc or is that
> optional. If it's optional when/why might I choose to do so.
> If I need Slurm's versions is there some way to find which
> pmix/libevent/hwloc my current Slurm install is using? Note:
> my sysadmins are not going to be helpful as they think Slurm
> 18 and OpenMPI 4.0.2a is adequate for users' needs :^(.
>
> I like the idea of _not_ tying my OpenMPI to the installed
> Slurm just in case our support people ever decide to upgrade
> system software.
>
> Thanks.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230328/c80ad996/attachment.htm>
More information about the slurm-users
mailing list