👍

Davide DelVento reacted via Gmail


On Fri, Mar 28, 2025 at 10:27 AM Matthias Leopold <matthias.leopold@meduniwien.ac.at> wrote:
Hi,

I solved the "ompi/pmix in container" problem. The solution is mentioned
here: https://github.com/NVIDIA/pyxis/wiki/Setup#slurmd-configuration
This is also needed:
https://github.com/NVIDIA/enroot/blob/master/conf/hooks/extra/50-slurm-pmi.sh
This was handled by deepops in my first cluster

Now I'm left with a "cuda devices not visible in all tasks when using
container" problem, but that's a different story.

Thanks to Howard and Davide for contributing

Matthias

Am 27.03.25 um 17:34 schrieb Pritchard Jr., Howard:
> Hi Matthias,
>
> Okay this is useful and the fact that the mpi4py works outside of a container is good news.
>
> It might be worth trying to turn on debugging the in slurm pmix plugin and see if that gives more info.
> May set the PMIxDebug in the mpi.conf file to 1 - https://slurm.schedmd.com/mpi.conf.html ?
>
> Also, if you could investigate which version of pmix is installed in the container that would be useful.
>
> I'm not sure how related the Open MPI  issue 12146 is to this one.   Are you observing any of the
> PMIX ERROR: OUT-OF-RESOURCE
> Or
> UNPACK-PMIX-VALUE: UNSUPPORTED TYPE
>
> Messages in the output from the app before seeing the message you have in you original post?
>
> Howard
>
> On 3/27/25, 9:20 AM, "Matthias Leopold" <matthias.leopold@meduniwien.ac.at <mailto:matthias.leopold@meduniwien.ac.at>> wrote:
>
>
> Hi Howard,
>
>
> thanks, but my Slurm 24.05 definitely has pmix support (visible in "srun
> –mpi=list") and it uses it through "MpiDefault=pmix" in slurm.conf. The
> mentioned problem also appears if I use a container with OpenMPI
> compiled against same pmix as Slurm 24.05 (which is Ubuntu 24.04 package
> libpmix2t64 in this case).
>
>
> In the meantime I found this bug report:
> https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues/12146__;!!Bt8fGhp8LhKGRg!Cpevn88KZzXerHNie-9NHrexWReP6L7sFY60bYPPJwcRo27UF-8wgdJeRt_BcioBj5ViRp4cQlIpiT9dP-wg01aTPXpW01gh4Uc$ <https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues/12146__;!!Bt8fGhp8LhKGRg!Cpevn88KZzXerHNie-9NHrexWReP6L7sFY60bYPPJwcRo27UF-8wgdJeRt_BcioBj5ViRp4cQlIpiT9dP-wg01aTPXpW01gh4Uc$> , which sounds very
> similar. I haven't completely worked through it, they use a different
> container solution and things seem to be COMPLICATED...but still...
>
>
> Also I found that my Slurm 24.05 works with OpenMPI outside of
> containers (with ompi examples or python mpi4py).
>
>
> Matthias
>
>
> Am 27.03.25 um 15:46 schrieb Pritchard Jr., Howard:
>> HI Matthias,
>>
>> It looks like the Open MPI in the containers was not built with PMI1 or
>> PMI2 support, so its defaulting to using PMIx.
>>
>> You are seeing this error message because the call within Open MPI
>> 4.1.x’s runtime system to PMIx_Init returned an error.
>>
>> Namely that there was no PMIx server to connect to.
>>
>> Not sure why the behavior would have changed between your SLURM variants.
>>
>> If you run
>>
>> srun –mpi=list
>>
>> does it show a pmix option?
>>
>> If not you need to rebuild slurm with the –with-pmix config option. You
>> may want to check what pmix library is installed in the containers and
>> if possible use that version of PMIx when rebuilding SLURM.
>>
>> Howard
>>
>> *From: *Davide DelVento via slurm-users <slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>>
>> *Reply-To: *Davide DelVento <davide.quantum@gmail.com <mailto:davide.quantum@gmail.com>>
>> *Date: *Thursday, March 27, 2025 at 7:41 AM
>> *To: *Matthias Leopold <matthias.leopold@meduniwien.ac.at <mailto:matthias.leopold@meduniwien.ac.at>>
>> *Cc: *Slurm User Community List <slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>>
>> *Subject: *[EXTERNAL] [slurm-users] Re: [EXTERN] Re: Slurm 24.05 and OpenMPI
>>
>> Hi Matthias,
>>
>> I see. It does not freak me out. Unfortunately I have very little
>> experience working with MPI-in-containers, so I don't know the best way
>> to debug this.
>>
>> What I do know is that some ABIs in Slurm change with Slurm major
>> versions and dependencies need to be recompiled with newer versions of
>> the latter. So maybe trying to recompile the OpenMPI-inside-the-
>> container against the version of Slurm you are utilizing is the first I
>> would try if I were in your shoes
>>
>> Best,
>>
>> Davide
>>
>> On Thu, Mar 27, 2025 at 4:19 AM Matthias Leopold
>> <matthias.leopold@meduniwien.ac.at <mailto:matthias.leopold@meduniwien.ac.at>
>> <mailto:matthias.leopold@meduniwien.ac.at <mailto:matthias.leopold@meduniwien.ac.at>>> wrote:
>>
>> Hi Davide,
>>
>> thanks for reply.
>> In my clusters OpenMPI is not present on the compute nodes. The
>> application (nccl-tests) is compiled inside the container against
>> OpenMPI. So when I run the same container in both clusters it's
>> effectively the exact same OpenMPI version. I hope you don't freak out
>> hearing this, but this worked with Slurm 21.08. I tried using a newer
>> container version and another OpenMPI (first it was Ubuntu 20.04 with
>> OpenMPI 4.1.7 from NVIDIA repo, second is Ubuntu 24.04 with Ubuntu
>> OpenMPI 4.1.6), but the error is the same when running the container in
>> Slurm 24.05.
>>
>> Matthias
>>
>> Am 26.03.25 um 21:24 schrieb Davide DelVento:
>>> Hi Matthias,
>>> Let's take the simplest things out first: have you compiled OpenMPI
>>> yourself, separately on both clusters, using the specific drivers
>> for
>>> whatever network you have on each? In my experience OpenMPI is quite
>>> finicky about working correctly, unless you do that. And when I
>> don't, I
>>> see exactly that error -- heck sometimes I see that even when
>> OpenMPI is
>>> (supposed?) to be compiled and linked correctly and in such cases I
>>> resolve it by starting jobs with "mpirun --mca smsc xpmem -n $tasks
>>> whatever-else-you-need" (which obviously may or may not be
>> relevant for
>>> your case).
>>> Cheers,
>>> Davide
>>>
>>> On Wed, Mar 26, 2025 at 12:51 PM Matthias Leopold via slurm-users
>>> <slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> <mailto:slurm-
>> users@lists.schedmd.com <mailto:users@lists.schedmd.com>> <mailto:slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>
>> <mailto:slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>>>>
>>> wrote:
>>>
>>> Hi,
>>>
>>> I built a small Slurm 21.08 cluster with NVIDIA GPU hardware
>> and NVIDIA
>>> deepops framework a couple of years ago. It is based on
>> Ubuntu 20.04
>>> and
>>> makes use of the NVIDIA pyxis/enroot container solution. For
>>> operational
>>> validation I used the nccl-tests application in a container.
>> nccl-tests
>>> is compiled with MPI support (OpenMPI 4.1.6 or 4.1.7) and I
>> used it
>>> also
>>> for validation of MPI jobs. Slurm jobs use "pmix" and tasks are
>>> launched
>>> via srun (not mpirun). Some of the GPUs can talk to each
>> other via
>>> Infiniband, but MPI is rarely used at our site and I'm fully
>> aware that
>>> my MPI knowledge is very limited. Still it worked with Slurm
>> 21.08.
>>>
>>> Now I built a Slurm 24.05 cluster based on Ubuntu 24.04 and
>> started to
>>> move hardware there. When I run my nccl-tests container (also
>> with
>>> newer
>>> software) I see error messages like this:
>>>
>>> [node1:21437] OPAL ERROR: Unreachable in file ext3x_client.c
>> at line 111
>>>
>> --------------------------------------------------------------------------
>>> The application appears to have been direct launched using
>> "srun",
>>> but OMPI was not built with SLURM's PMI support and therefore
>> cannot
>>> execute. There are several options for building PMI support under
>>> SLURM, depending upon the SLURM version you are using:
>>>
>>> version 16.05 or later: you can use SLURM's PMIx support.
>> This
>>> requires that you configure and build SLURM --with-pmix.
>>>
>>> Versions earlier than 16.05: you must use either SLURM's
>> PMI-1 or
>>> PMI-2 support. SLURM builds PMI-1 by default, or you can
>> manually
>>> install PMI-2. You must then build Open MPI using --with-pmi
>>> pointing
>>> to the SLURM PMI library location.
>>>
>>> Please configure as appropriate and try again.
>>>
>> --------------------------------------------------------------------------
>>> *** An error occurred in MPI_Init
>>> *** on a NULL communicator
>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will
>> now abort,
>>> *** and potentially your MPI job)
>>> [node1:21437] Local abort before MPI_INIT completed completed
>>> successfully, but am not able to aggregate error messages,
>> and not able
>>> to guarantee that all other processes were killed!
>>>
>>> One simple question:
>>> Is this related to https://urldefense.com/v3/__https://github.com/open-mpi/ompi/__;!!Bt8fGhp8LhKGRg!Cpevn88KZzXerHNie-9NHrexWReP6L7sFY60bYPPJwcRo27UF-8wgdJeRt_BcioBj5ViRp4cQlIpiT9dP-wg01aTPXpWP0y_img$ <https://urldefense.com/v3/__https://github.com/open-mpi/ompi/__;!!Bt8fGhp8LhKGRg!Cpevn88KZzXerHNie-9NHrexWReP6L7sFY60bYPPJwcRo27UF-8wgdJeRt_BcioBj5ViRp4cQlIpiT9dP-wg01aTPXpWP0y_img$>
>> issues/12471 <https://urldefense.com/v3/__https:/github.com/open- <https://urldefense.com/v3/__https:/github.com/open->
>> mpi/ompi/issues/12471__;!!Bt8fGhp8LhKGRg!
>> HbDrlb62ejeR1sQXdPbyKWMgWXxLYYaShWrhQ7F2zfXYudPXia0kOaOmWAp-
>> bgj1LUQ5qYPmxmh9MuZD3Z7HigijM60$>
>>> <https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues/12471__;!!Bt8fGhp8LhKGRg!Cpevn88KZzXerHNie-9NHrexWReP6L7sFY60bYPPJwcRo27UF-8wgdJeRt_BcioBj5ViRp4cQlIpiT9dP-wg01aTPXpWOb61kmU$ <https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues/12471__;!!Bt8fGhp8LhKGRg!Cpevn88KZzXerHNie-9NHrexWReP6L7sFY60bYPPJwcRo27UF-8wgdJeRt_BcioBj5ViRp4cQlIpiT9dP-wg01aTPXpWOb61kmU$> <https://
>> http://urldefense.com/v3/__https:/github.com/open-mpi/ompi/ <http://urldefense.com/v3/__https:/github.com/open-mpi/ompi/>
>> issues/12471__;!!Bt8fGhp8LhKGRg!
>> HbDrlb62ejeR1sQXdPbyKWMgWXxLYYaShWrhQ7F2zfXYudPXia0kOaOmWAp-
>> bgj1LUQ5qYPmxmh9MuZD3Z7HigijM60$>>?
>>> If so: is there some workaround?
>>>
>>> I'm very grateful for any comments. I know that a lot of detail
>>> information is missing, but maybe someone can still already
>> give me a
>>> hint where to look.
>>>
>>> Thanks a lot
>>> Matthias
>>>
>>>
>>> --
>>> slurm-users mailing list -- slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>
>> <mailto:slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>>
>>> <mailto:slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> <mailto:slurm-
>> users@lists.schedmd.com <mailto:users@lists.schedmd.com>>>
>>> To unsubscribe send an email to slurm-users-
>> leave@lists.schedmd.com <mailto:leave@lists.schedmd.com> <mailto:slurm-users-leave@lists.schedmd.com <mailto:slurm-users-leave@lists.schedmd.com>>
>>> <mailto:slurm-users-leave@lists.schedmd.com <mailto:slurm-users-leave@lists.schedmd.com> <mailto:slurm-
>> users-leave@lists.schedmd.com <mailto:users-leave@lists.schedmd.com>>>
>>>
>>
>>
>
>

--
Medizinische Universität Wien

Matthias Leopold

IT Services & strategisches Informationsmanagement
Enterprise Technology & Infrastructure

Spitalgasse 23, 1090 Wien
T: +43 1 40160 21241

matthias.leopold@meduniwien.ac.at
https://www.meduniwien.ac.at