[slurm-users] slurm-wlm package OpenMPI PMIx implementation

Luke Yeager lyeager at nvidia.com
Thu Dec 10 18:27:41 UTC 2020


The ubuntu package is here: https://packages.ubuntu.com/focal/libpmix-dev

Yes, we rewrote the service files (see here<https://github.com/NVIDIA/nephele-packages/blob/master/slurm/debian/PACKAGE-control.slurmctld.service>) and we let debhelper install them to the appropriate location.


It seems like you’re wanting to simply get a development build going rather than building packages for distribution. Nonetheless, reading through the packaging files here might help because it shows how to build recent slurm on recent ubuntu/Debian: https://github.com/NVIDIA/nephele-packages/tree/master/slurm

From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Avery Grieve
Sent: Thursday, December 10, 2020 10:18 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] slurm-wlm package OpenMPI PMIx implementation

External email: Use caution opening links or attachments

Hey Luke,

Thanks for the response. I should have mentioned I'm on debian. What's the name of the ubuntu package for pmix? I'll see if I can track down the debian equivalent.

When you build slurm from scratch you have to place the .service files into /etc/init.d and the daemon files in /etc/systemd/system, right? When I tried building from source it didn't do that for me (even as root). Not sure if intended or if I was missing something.

Thanks
-ave

On Thu, Dec 10, 2020, 1:11 PM Luke Yeager <lyeager at nvidia.com<mailto:lyeager at nvidia.com>> wrote:
Hi Avery,


  *   pmix: we just use the standard Ubuntu packages on 20.04. Unfortunately the standard packages on 18.04 are too out of date for us.
  *   openmpi: we build our own, using ./configure --with-pmix=internal …
  *   slurm: we build our own, using ./configure --with-pmix=PATH … (see here<https://github.com/NVIDIA/nephele-packages/blob/42145aef4bbe2cff335a1fca222766232dab7aa7/slurm/debian/rules#L41>)

Then we can set MpiDefault=pmix (see here<https://github.com/NVIDIA/nephele/blob/1d79977164d5ef1418466bfb322d59d502c18e8f/ansible/roles/slurm/templates/etc/slurm/slurm.conf.default#L87>) and it works.

$ srun --mpi=list
srun: MPI types are...
srun: cray_shasta
srun: pmi2
srun: pmix_v3
srun: pmix
srun: none

Hope that helps,
Luke

From: slurm-users <slurm-users-bounces at lists.schedmd.com<mailto:slurm-users-bounces at lists.schedmd.com>> On Behalf Of Avery Grieve
Sent: Thursday, December 10, 2020 7:52 AM
To: slurm-users at lists.schedmd.com<mailto:slurm-users at lists.schedmd.com>
Subject: [slurm-users] slurm-wlm package OpenMPI PMIx implementation

External email: Use caution opening links or attachments

Hi Forum,

I've been putting together an ARM cluster for fun/learning and I've been a bit lost about how to get OpenMPI and slurm to behave together.

I have installed the slurm-wlm package <https://packages.debian.org/buster/slurm-wlm> from the Debian apt search and compiled OpenMPI from source on my compute nodes. OpenMPI has been compiled with the option --with-slurm and the configure time log indicates openmpi has pmix v3 built in. I thought that would be enough for slurm and calling a job with "srun -n 4 -N1 executable" (with slurm.conf having MpiDefault=pmix_v3) would be enough.

Not the case, unfortunately as slurm doesn't have any idea what pmix_v3 means without being compiled against it I guess. I have also attempted to compile openmpi from source with the --with-pmi option but the slurm-wlm package doesn't install any of the libraries/headers (pmi.h pmi2.h pmix.h etc). Neither does any of the slurm-llnl develop packages, so I'm at a loss of what to do here.

A few notes: OpenMPI is working across my compute nodes. I'm able to ssh to my compute node and start a job manually with mpirun that executes successfully across the nodes. My slurmctld and slurmd daemons work for single thread resource allocation (and presumably OpenMP multithreading, though I haven't tested this).

Beyond compiling slurm from source (assuming this installs the pmi headers that I can use to build openmpi), which I have tried with no luck on my devices, is there a way to get slurm and openmpi to behave together using the precompiled package slurm-wlm?

Thank you,

~Avery Grieve
They/Them/Theirs please!
University of Michigan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201210/def41e5c/attachment-0001.htm>


More information about the slurm-users mailing list