[slurm-users] slurm-wlm package OpenMPI PMIx implementation
Avery Grieve
agrieve at umich.edu
Thu Dec 10 15:51:34 UTC 2020
Hi Forum,
I've been putting together an ARM cluster for fun/learning and I've been a
bit lost about how to get OpenMPI and slurm to behave together.
I have installed the slurm-wlm package
<https://packages.debian.org/buster/slurm-wlm>from the Debian apt search
and compiled OpenMPI from source on my compute nodes. OpenMPI has been
compiled with the option --with-slurm and the configure time log indicates
openmpi has pmix v3 built in. I thought that would be enough for slurm and
calling a job with "srun -n 4 -N1 executable" (with slurm.conf having
MpiDefault=pmix_v3) would be enough.
Not the case, unfortunately as slurm doesn't have any idea what pmix_v3
means without being compiled against it I guess. I have also attempted to
compile openmpi from source with the --with-pmi option but the slurm-wlm
package doesn't install any of the libraries/headers (pmi.h pmi2.h pmix.h
etc). Neither does any of the slurm-llnl develop packages, so I'm at a loss
of what to do here.
A few notes: OpenMPI is working across my compute nodes. I'm able to ssh to
my compute node and start a job manually with mpirun that executes
successfully across the nodes. My slurmctld and slurmd daemons work for
single thread resource allocation (and presumably OpenMP multithreading,
though I haven't tested this).
Beyond compiling slurm from source (assuming this installs the pmi headers
that I can use to build openmpi), which I have tried with no luck on my
devices, is there a way to get slurm and openmpi to behave together using
the precompiled package slurm-wlm?
Thank you,
~Avery Grieve
They/Them/Theirs please!
University of Michigan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201210/bd55dcde/attachment.htm>
More information about the slurm-users
mailing list