[slurm-users] [EXTERNAL] OpenMPI and Slurm clarification?

Craig cfreese at super.org
Tue Mar 28 14:33:32 UTC 2023


Ok, thanks.  "Coordinating" with sys admins is problematic so I guess 
I'll just continue with the internal pmix and keep an eye out for problems.

At least I know I'm not doing anything blatantly stupid.

On 3/27/23 20:46, Pritchard Jr., Howard wrote:
>
> HI Craig,
>
> Its not essential to use the pmix lib used to build the SLURM pmix 
> plugin but it does reduce likelihood of problems.
>
> I don’t know how, but there is some way that the admin installing 
> SLURM can “name” the available pmix –mpi options.
>
> For instance on one of our systems, the admin has built multiple 
> variants of the pmix plugin:
>
> MPI plugin types are...
>
> cray_shasta
>
> none
>
> pmi2
>
> pmix
>
> specific pmix plugin versions available: 
> pmix_v2,pmix_v3,pmix_v314,pmix_v4,pmix_v422
>
> This naming convention has helped us with “decoupling” building of 
> Open MPI from SLURM build, but does mean some coordination with the 
> sys admins.
>
> We’re using SLURM 22.05.6
>
> Hope this helps,
>
> Howard
>
> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf 
> of Craig <cfreese at super.org>
> *Reply-To: *Slurm User Community List <slurm-users at lists.schedmd.com>
> *Date: *Monday, March 27, 2023 at 2:01 PM
> *To: *"slurm-users at lists.schedmd.com" <slurm-users at lists.schedmd.com>
> *Subject: *Re: [slurm-users] [EXTERNAL] OpenMPI and Slurm clarification?
>
> conf.log...
>
>     checking if user requested PMI support
>
>     result: no
>
>     checking if user requested internal PMIx support(yes)
>
>     result: no
>
>     checking for pmix.h in /usr
>
>     result: not found
>
>     checking for pmix.h in /usr/include
>
>     result: not found
>
>     WARNING: discovered external PMIx version is less than internal
>     version 3.x
>
>     WARNING: using internal PMIx
>
> So is looks like it used the internal version (which is what I was 
> aiming for) and that's ok by me since it seems to be working, but if 
> I'm really supposed to be using the same one that SLURM used then I'm 
> gonna have to figure out a way to determine what that was/is.
>
> On 3/27/23 15:28, Pritchard Jr., Howard wrote:
>
>     HI Craig,
>
>     Your use of the –with-pmix on the open mpi configure line is
>     important.  Without  any args to this configure option open mpi
>     configure will first check if there’s an external pmix which is
>     newer than the one that is included in the openmpi release
>     tarball.  If it is not, the internal pmix is built.
>
>     You can check in the config.log whether the internal PMix or an
>     external one was used.
>
>     If you want to be extra careful, find the location of the PMIx v3
>     used to build the SLURM PMIx plugin, and then rebuild your open
>     mpi 4.1.5 with
>
>     ./configure …
>     --with-pmix=path_to_pmix_used_for_slurm_pmix_plugin_build ….
>
>     But you may be okay without doing this. You can check this by
>     running your open mpi job with
>
>     srun –mpi=pmix_v3 -N2 foo
>
>     and see if it behaves as expected.
>
>     I’m not sure what the “openmpi” result from srun –mpi=list is about.
>
>     Howard
>
>     *From: *slurm-users <slurm-users-bounces at lists.schedmd.com>
>     <mailto:slurm-users-bounces at lists.schedmd.com>on behalf of Craig
>     <cfreese at super.org> <mailto:cfreese at super.org>
>     *Reply-To: *Slurm User Community List
>     <slurm-users at lists.schedmd.com> <mailto:slurm-users at lists.schedmd.com>
>     *Date: *Monday, March 27, 2023 at 12:54 PM
>     *To: *"slurm-users at lists.schedmd.com"
>     <mailto:slurm-users at lists.schedmd.com><slurm-users at lists.schedmd.com>
>     <mailto:slurm-users at lists.schedmd.com>
>     *Subject: *Re: [slurm-users] [EXTERNAL] OpenMPI and Slurm
>     clarification?
>
>     srun: MPI types are...
>
>     srun: none
>
>     srun: openmpi
>
>     srun: pmix_v3
>
>     srun: pmi2
>
>     srun: pmix
>
>     but I'm not sure that tells me much about how I am supposed to be
>     building OpenMPI?
>
>     On 3/27/23 14:41, Pritchard Jr., Howard wrote:
>
>         HI Craig,
>
>         If you run
>
>         srun –mpi=list
>
>         what does slurm report?
>
>         That will help in determining what argument you want to supply
>         for the –mpi srun option.
>
>         Howard
>
>         *From: *slurm-users <slurm-users-bounces at lists.schedmd.com>
>         <mailto:slurm-users-bounces at lists.schedmd.com>on behalf of
>         Craig <cfreese at super.org> <mailto:cfreese at super.org>
>         *Reply-To: *Slurm User Community List
>         <slurm-users at lists.schedmd.com>
>         <mailto:slurm-users at lists.schedmd.com>
>         *Date: *Monday, March 27, 2023 at 12:38 PM
>         *To: *"slurm-users at lists.schedmd.com"
>         <mailto:slurm-users at lists.schedmd.com><slurm-users at lists.schedmd.com>
>         <mailto:slurm-users at lists.schedmd.com>
>         *Subject: *[EXTERNAL] [slurm-users] OpenMPI and Slurm
>         clarification?
>
>
>         Can someone please clarify the "best practices" for building
>         OpenMPI compatible with Slurm?
>
>         https://slurm.schedmd.com/mpi_guide.html#open_mpi
>         <https://urldefense.com/v3/__https:/slurm.schedmd.com/mpi_guide.html*open_mpi__;Iw!!Bt8fGhp8LhKGRg!Cb86a2IwxgqfT5fv1_eEByDpAyhly3ZdN6Wwl7Wod9FRPx9HBpvFVojIRgu5oSpti_3jOXhNyvJqEMGs$>
>         tells me what I _can_ do but I'm unclear as to what I _should_
>         do.
>
>         I've built OpenMPI 4.1.5 with:   --with-pmix
>         --with-libevent=internal  --with-hwloc=internal --with-slurm. 
>         If I run an MPI program on my cluster (slurm 18.08.8) with
>         "srun -N2 foo" it seems to work fine. (slurm.conf has
>         MpiDefault=pmix).
>
>         If I "srun --mpi=openmpi -N2 foo" it chokes with:
>
>             OPAL_ERROR: Unreachable in file
>             ../../../../../opal/mca/pmix/pmix3/pmix3x_client.c at line 112
>             -------------------------------------------------------------------------------------------------------------------
>             This application appears to have been direct launched
>             using "srun",
>             but OMPI was not build with SLURM's PMI support and
>             therefore cannot
>             execute.  There are several options for building PMI
>             support under
>             SLURM, depending upon the SLURM version you are using:
>
>             version 16.05 or later: you can use SLURM's PMIx support. THis
>             require that you configure and uild SLURM --with-pmix.
>             .
>             .
>             .
>
>
>         So I guess the question is, what is the "right" way to build
>         OpenMPI with Slurm.  Is the fact that my non-Slurm pmix works
>         "correct" or am I just getting lucky that the various software
>         I have just happens to be compatible.  If I build OpenMPI am I
>         supposed to use Slurm's pmix/libevent/hwloc or is that
>         optional.  If it's optional when/why might I choose to do so. 
>         If I need Slurm's versions is there some way to find which
>         pmix/libevent/hwloc my current Slurm install is using? Note:
>         my sysadmins are not going to be helpful as they think Slurm
>         18 and OpenMPI 4.0.2a is adequate for users' needs :^(.
>
>         I like the idea of _not_ tying my OpenMPI to the installed
>         Slurm just in case our support people ever decide to upgrade
>         system software.
>
>         Thanks.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230328/c80ad996/attachment.htm>


More information about the slurm-users mailing list