[slurm-users] slurm-wlm package OpenMPI PMIx implementation

Avery Grieve agrieve at umich.edu
Thu Dec 10 18:18:11 UTC 2020


Hey Luke,

Thanks for the response. I should have mentioned I'm on debian. What's the
name of the ubuntu package for pmix? I'll see if I can track down the
debian equivalent.

When you build slurm from scratch you have to place the .service files into
/etc/init.d and the daemon files in /etc/systemd/system, right? When I
tried building from source it didn't do that for me (even as root). Not
sure if intended or if I was missing something.

Thanks
-ave

On Thu, Dec 10, 2020, 1:11 PM Luke Yeager <lyeager at nvidia.com> wrote:

> Hi Avery,
>
>
>
>    - pmix: we just use the standard Ubuntu packages on 20.04.
>    Unfortunately the standard packages on 18.04 are too out of date for us.
>    - openmpi: we build our own, using ./configure --with-pmix=internal …
>    - slurm: we build our own, using ./configure --with-pmix=PATH … (see
>    here
>    <https://github.com/NVIDIA/nephele-packages/blob/42145aef4bbe2cff335a1fca222766232dab7aa7/slurm/debian/rules#L41>
>    )
>
>
>
> Then we can set MpiDefault=pmix (see here
> <https://github.com/NVIDIA/nephele/blob/1d79977164d5ef1418466bfb322d59d502c18e8f/ansible/roles/slurm/templates/etc/slurm/slurm.conf.default#L87>)
> and it works.
>
>
>
> $ srun --mpi=list
>
> srun: MPI types are...
>
> srun: cray_shasta
>
> srun: pmi2
>
> srun: pmix_v3
>
> srun: pmix
>
> srun: none
>
>
>
> Hope that helps,
>
> Luke
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Avery Grieve
> *Sent:* Thursday, December 10, 2020 7:52 AM
> *To:* slurm-users at lists.schedmd.com
> *Subject:* [slurm-users] slurm-wlm package OpenMPI PMIx implementation
>
>
>
> *External email: Use caution opening links or attachments*
>
>
>
> Hi Forum,
>
>
>
> I've been putting together an ARM cluster for fun/learning and I've been a
> bit lost about how to get OpenMPI and slurm to behave together.
>
>
>
> I have installed the slurm-wlm package
> <https://packages.debian.org/buster/slurm-wlm>from the Debian apt search
> and compiled OpenMPI from source on my compute nodes. OpenMPI has been
> compiled with the option --with-slurm and the configure time log indicates
> openmpi has pmix v3 built in. I thought that would be enough for slurm and
> calling a job with "srun -n 4 -N1 executable" (with slurm.conf having
> MpiDefault=pmix_v3) would be enough.
>
>
>
> Not the case, unfortunately as slurm doesn't have any idea what pmix_v3
> means without being compiled against it I guess. I have also attempted to
> compile openmpi from source with the --with-pmi option but the slurm-wlm
> package doesn't install any of the libraries/headers (pmi.h pmi2.h pmix.h
> etc). Neither does any of the slurm-llnl develop packages, so I'm at a loss
> of what to do here.
>
>
>
> A few notes: OpenMPI is working across my compute nodes. I'm able to ssh
> to my compute node and start a job manually with mpirun that executes
> successfully across the nodes. My slurmctld and slurmd daemons work for
> single thread resource allocation (and presumably OpenMP multithreading,
> though I haven't tested this).
>
>
>
> Beyond compiling slurm from source (assuming this installs the pmi headers
> that I can use to build openmpi), which I have tried with no luck on my
> devices, is there a way to get slurm and openmpi to behave together using
> the precompiled package slurm-wlm?
>
>
>
> Thank you,
>
>
>
> ~Avery Grieve
>
> They/Them/Theirs please!
>
> University of Michigan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201210/9474abb5/attachment.htm>


More information about the slurm-users mailing list