[slurm-users] slurm-wlm package OpenMPI PMIx implementation
Avery Grieve
agrieve at umich.edu
Thu Dec 10 16:38:18 UTC 2020
Oop, sorry I meant to also include the following:
# srun --mpi=list
srun: MPI types are...
srun: none
srun: pmi2
srun: openmpi
running srun with --mpi=openmpi gives the same errors as with
MpiDefault=none.
~Avery Grieve
They/Them/Theirs please!
University of Michigan
On Thu, Dec 10, 2020 at 11:34 AM Avery Grieve <agrieve at umich.edu> wrote:
> Hi Chris,
>
> Thank you for the offer. Here's some quick information on my system:
>
> All nodes on Debian 10 (armbian buster converted to DietPi v6.33.3).
> sinfo --version: slurm-wlm 18.08.5-2
>
> With MpiDefault=pmix I get the following srun errors:
> srun: error: Couldn't find the specified plugin name for mpi/pmix looking
> at all files
> srun: error: cannot find mpi plugin for mpi/pmix
> srun: error: cannot create mpi context for mpi/pmix
> srun: error: invalid MPI type 'pmix', --mpi=list for acceptable types
>
> With MpiDefault=none
> I get OpenMPI yelling at me and giving me two options, only one relevant
> to the version of Slurm I'm running:
> version 16.05 or later: you can use SLURM's PMIx support. This
> requires that you configure and build SLURM --with-pmix.
>
> However, as I stated, I'm using the slurm-wlm package which seems to not
> include the pmix functionality by default.
>
> The other option provided:
> Versions earlier than 16.05: you must use either SLURM's PMI-1 or
> PMI-2 support. SLURM builds PMI-1 by default, or you can manually
> install PMI-2. You must then build Open MPI using --with-pmi pointing
> to the SLURM PMI library location.
>
> Similar issue, not building slurm from source doesn't include the PMI
> library. I've installed some develop level packages, including the
> libpmi2-0 package <https://packages.debian.org/buster/libpmi2-0> which
> didn't seem to actually install anything useful as far as I can tell using
> the "find" command.
>
> It's sort of looking like I should be looking at building slurm from
> source again, I guess.
>
> Thanks,
>
> ~Avery Grieve
> They/Them/Theirs please!
> University of Michigan
>
>
> On Thu, Dec 10, 2020 at 11:16 AM Christopher J Cawley <ccawley2 at gmu.edu>
> wrote:
>
>> I have a 7 node jetson nano cluster running at home.
>>
>> Send me what you want me to take a look at . If it's not
>> a big deal, then I can let you know.
>>
>> Ubuntu 18 / slurm <some version from rpm>
>>
>> Thanks
>> Chris
>>
>>
>> *Christopher J. Cawley*
>>
>> *Systems Engineer/Linux Engineer, Information Technology Services*
>>
>> *223 Aquia Building, Ffx,** MSN**: 1B5*
>>
>> *George Mason University*
>>
>> *Phone:** (703) 993-6397*
>>
>> *Email:* *ccawley2 at gmu.edu*
>>
>>
>> ------------------------------
>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
>> Avery Grieve <agrieve at umich.edu>
>> *Sent:* Thursday, December 10, 2020 10:51 AM
>> *To:* slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
>> *Subject:* [slurm-users] slurm-wlm package OpenMPI PMIx implementation
>>
>> Hi Forum,
>>
>> I've been putting together an ARM cluster for fun/learning and I've been
>> a bit lost about how to get OpenMPI and slurm to behave together.
>>
>> I have installed the slurm-wlm package
>> <https://secure-web.cisco.com/14EwNb3UZYABzVqRN7IxszUw4L04o_2Bv7wm3a5vivtuqZhDuY3UrhulGE47J31qdoC16rhtMefWeyLXhK10TMim7oOCehTuBJR_47pTBDKcO_xYDX3yqOG1yzamsO31hXo3HS9tSUpOssM40vTLwy4Mxfggu2Qu_yXjJqtLE43mV2CrECvinY7hMt_cRMzi4b8xrKZXqngR31DMmyA9DzimeyLsN7nwxh6kJRMhcg2MjHlCOhu356VVZrErEM9ZafOD66sDUMluigARg1icclZaJOLhXE-7PlFRtAdk2dhXLEvRqSL3SUKrVeBy01MCmSi7sH8bkIijrujncTBU-DfWxY_JOwqhhsJAyXl0XJgjoOiGWHKcLPRRvrCbn_SGHGSw2Ogq3aC4sJLY1tBLwpgvXcOxFoURgb6y6WfJJg04H9ewyQ-Azr7kA_en7DIk_4KOux310uOWzo7XrHTxnLg/https%3A%2F%2Fpackages.debian.org%2Fbuster%2Fslurm-wlm>from
>> the Debian apt search and compiled OpenMPI from source on my compute nodes.
>> OpenMPI has been compiled with the option --with-slurm and the configure
>> time log indicates openmpi has pmix v3 built in. I thought that would be
>> enough for slurm and calling a job with "srun -n 4 -N1 executable" (with
>> slurm.conf having MpiDefault=pmix_v3) would be enough.
>>
>> Not the case, unfortunately as slurm doesn't have any idea what pmix_v3
>> means without being compiled against it I guess. I have also attempted to
>> compile openmpi from source with the --with-pmi option but the slurm-wlm
>> package doesn't install any of the libraries/headers (pmi.h pmi2.h pmix.h
>> etc). Neither does any of the slurm-llnl develop packages, so I'm at a loss
>> of what to do here.
>>
>> A few notes: OpenMPI is working across my compute nodes. I'm able to ssh
>> to my compute node and start a job manually with mpirun that executes
>> successfully across the nodes. My slurmctld and slurmd daemons work for
>> single thread resource allocation (and presumably OpenMP multithreading,
>> though I haven't tested this).
>>
>> Beyond compiling slurm from source (assuming this installs the pmi
>> headers that I can use to build openmpi), which I have tried with no luck
>> on my devices, is there a way to get slurm and openmpi to behave together
>> using the precompiled package slurm-wlm?
>>
>> Thank you,
>>
>> ~Avery Grieve
>> They/Them/Theirs please!
>> University of Michigan
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201210/747ed1f0/attachment-0001.htm>
More information about the slurm-users
mailing list