[slurm-users] slurm-wlm package OpenMPI PMIx implementation
Luke Yeager
lyeager at nvidia.com
Thu Dec 10 18:09:11 UTC 2020
Hi Avery,
* pmix: we just use the standard Ubuntu packages on 20.04. Unfortunately the standard packages on 18.04 are too out of date for us.
* openmpi: we build our own, using ./configure --with-pmix=internal …
* slurm: we build our own, using ./configure --with-pmix=PATH … (see here<https://github.com/NVIDIA/nephele-packages/blob/42145aef4bbe2cff335a1fca222766232dab7aa7/slurm/debian/rules#L41>)
Then we can set MpiDefault=pmix (see here<https://github.com/NVIDIA/nephele/blob/1d79977164d5ef1418466bfb322d59d502c18e8f/ansible/roles/slurm/templates/etc/slurm/slurm.conf.default#L87>) and it works.
$ srun --mpi=list
srun: MPI types are...
srun: cray_shasta
srun: pmi2
srun: pmix_v3
srun: pmix
srun: none
Hope that helps,
Luke
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Avery Grieve
Sent: Thursday, December 10, 2020 7:52 AM
To: slurm-users at lists.schedmd.com
Subject: [slurm-users] slurm-wlm package OpenMPI PMIx implementation
External email: Use caution opening links or attachments
Hi Forum,
I've been putting together an ARM cluster for fun/learning and I've been a bit lost about how to get OpenMPI and slurm to behave together.
I have installed the slurm-wlm package <https://packages.debian.org/buster/slurm-wlm> from the Debian apt search and compiled OpenMPI from source on my compute nodes. OpenMPI has been compiled with the option --with-slurm and the configure time log indicates openmpi has pmix v3 built in. I thought that would be enough for slurm and calling a job with "srun -n 4 -N1 executable" (with slurm.conf having MpiDefault=pmix_v3) would be enough.
Not the case, unfortunately as slurm doesn't have any idea what pmix_v3 means without being compiled against it I guess. I have also attempted to compile openmpi from source with the --with-pmi option but the slurm-wlm package doesn't install any of the libraries/headers (pmi.h pmi2.h pmix.h etc). Neither does any of the slurm-llnl develop packages, so I'm at a loss of what to do here.
A few notes: OpenMPI is working across my compute nodes. I'm able to ssh to my compute node and start a job manually with mpirun that executes successfully across the nodes. My slurmctld and slurmd daemons work for single thread resource allocation (and presumably OpenMP multithreading, though I haven't tested this).
Beyond compiling slurm from source (assuming this installs the pmi headers that I can use to build openmpi), which I have tried with no luck on my devices, is there a way to get slurm and openmpi to behave together using the precompiled package slurm-wlm?
Thank you,
~Avery Grieve
They/Them/Theirs please!
University of Michigan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201210/2ce0c844/attachment-0001.htm>
More information about the slurm-users
mailing list