[slurm-users] slurm-wlm package OpenMPI PMIx implementation

Avery Grieve agrieve at umich.edu
Fri Dec 11 04:23:41 UTC 2020


Hey Luke,

Just wanted to let you know that your tips helped a lot and I now have srun
able to call openmpi as required. Took a bit of finangling with the Unit
files but it does indeed work.

I've got some warning messages about "undefined symbol: pmix_cb_t_class"
but they are ignored so they're more annoying than anything.

Thank you!

~Avery Grieve
They/Them/Theirs please!
University of Michigan


On Thu, Dec 10, 2020 at 1:30 PM Luke Yeager <lyeager at nvidia.com> wrote:

> The ubuntu package is here: https://packages.ubuntu.com/focal/libpmix-dev
>
>
>
> Yes, we rewrote the service files (see here
> <https://github.com/NVIDIA/nephele-packages/blob/master/slurm/debian/PACKAGE-control.slurmctld.service>)
> and we let debhelper install them to the appropriate location.
>
>
>
>
>
> It seems like you’re wanting to simply get a development build going
> rather than building packages for distribution. Nonetheless, reading
> through the packaging files here might help because it shows how to build
> recent slurm on recent ubuntu/Debian:
> https://github.com/NVIDIA/nephele-packages/tree/master/slurm
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Avery Grieve
> *Sent:* Thursday, December 10, 2020 10:18 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] slurm-wlm package OpenMPI PMIx implementation
>
>
>
> *External email: Use caution opening links or attachments*
>
>
>
> Hey Luke,
>
>
>
> Thanks for the response. I should have mentioned I'm on debian. What's the
> name of the ubuntu package for pmix? I'll see if I can track down the
> debian equivalent.
>
>
>
> When you build slurm from scratch you have to place the .service files
> into /etc/init.d and the daemon files in /etc/systemd/system, right? When I
> tried building from source it didn't do that for me (even as root). Not
> sure if intended or if I was missing something.
>
>
>
> Thanks
>
> -ave
>
>
>
> On Thu, Dec 10, 2020, 1:11 PM Luke Yeager <lyeager at nvidia.com> wrote:
>
> Hi Avery,
>
>
>
>    - pmix: we just use the standard Ubuntu packages on 20.04.
>    Unfortunately the standard packages on 18.04 are too out of date for us.
>    - openmpi: we build our own, using ./configure --with-pmix=internal …
>    - slurm: we build our own, using ./configure --with-pmix=PATH … (see
>    here
>    <https://github.com/NVIDIA/nephele-packages/blob/42145aef4bbe2cff335a1fca222766232dab7aa7/slurm/debian/rules#L41>
>    )
>
>
>
> Then we can set MpiDefault=pmix (see here
> <https://github.com/NVIDIA/nephele/blob/1d79977164d5ef1418466bfb322d59d502c18e8f/ansible/roles/slurm/templates/etc/slurm/slurm.conf.default#L87>)
> and it works.
>
>
>
> $ srun --mpi=list
>
> srun: MPI types are...
>
> srun: cray_shasta
>
> srun: pmi2
>
> srun: pmix_v3
>
> srun: pmix
>
> srun: none
>
>
>
> Hope that helps,
>
> Luke
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Avery Grieve
> *Sent:* Thursday, December 10, 2020 7:52 AM
> *To:* slurm-users at lists.schedmd.com
> *Subject:* [slurm-users] slurm-wlm package OpenMPI PMIx implementation
>
>
>
> *External email: Use caution opening links or attachments*
>
>
>
> Hi Forum,
>
>
>
> I've been putting together an ARM cluster for fun/learning and I've been a
> bit lost about how to get OpenMPI and slurm to behave together.
>
>
>
> I have installed the slurm-wlm package
> <https://packages.debian.org/buster/slurm-wlm>from the Debian apt search
> and compiled OpenMPI from source on my compute nodes. OpenMPI has been
> compiled with the option --with-slurm and the configure time log indicates
> openmpi has pmix v3 built in. I thought that would be enough for slurm and
> calling a job with "srun -n 4 -N1 executable" (with slurm.conf having
> MpiDefault=pmix_v3) would be enough.
>
>
>
> Not the case, unfortunately as slurm doesn't have any idea what pmix_v3
> means without being compiled against it I guess. I have also attempted to
> compile openmpi from source with the --with-pmi option but the slurm-wlm
> package doesn't install any of the libraries/headers (pmi.h pmi2.h pmix.h
> etc). Neither does any of the slurm-llnl develop packages, so I'm at a loss
> of what to do here.
>
>
>
> A few notes: OpenMPI is working across my compute nodes. I'm able to ssh
> to my compute node and start a job manually with mpirun that executes
> successfully across the nodes. My slurmctld and slurmd daemons work for
> single thread resource allocation (and presumably OpenMP multithreading,
> though I haven't tested this).
>
>
>
> Beyond compiling slurm from source (assuming this installs the pmi headers
> that I can use to build openmpi), which I have tried with no luck on my
> devices, is there a way to get slurm and openmpi to behave together using
> the precompiled package slurm-wlm?
>
>
>
> Thank you,
>
>
>
> ~Avery Grieve
>
> They/Them/Theirs please!
>
> University of Michigan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201210/60719758/attachment.htm>


More information about the slurm-users mailing list