[slurm-users] slurm-wlm package OpenMPI PMIx implementation

Christopher J Cawley ccawley2 at gmu.edu
Thu Dec 10 16:13:26 UTC 2020


I have a 7 node jetson nano cluster running at home.

Send me what you want me to take a look at .  If it's not
a big deal, then I can let you know.

Ubuntu 18 / slurm <some version from rpm>

Thanks
Chris


Christopher J. Cawley

Systems Engineer/Linux Engineer, Information Technology Services

223 Aquia Building, Ffx, MSN: 1B5

George Mason University


Phone: (703) 993-6397

Email: ccawley2 at gmu.edu

​

________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Avery Grieve <agrieve at umich.edu>
Sent: Thursday, December 10, 2020 10:51 AM
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: [slurm-users] slurm-wlm package OpenMPI PMIx implementation

Hi Forum,

I've been putting together an ARM cluster for fun/learning and I've been a bit lost about how to get OpenMPI and slurm to behave together.

I have installed the slurm-wlm package <https://secure-web.cisco.com/14EwNb3UZYABzVqRN7IxszUw4L04o_2Bv7wm3a5vivtuqZhDuY3UrhulGE47J31qdoC16rhtMefWeyLXhK10TMim7oOCehTuBJR_47pTBDKcO_xYDX3yqOG1yzamsO31hXo3HS9tSUpOssM40vTLwy4Mxfggu2Qu_yXjJqtLE43mV2CrECvinY7hMt_cRMzi4b8xrKZXqngR31DMmyA9DzimeyLsN7nwxh6kJRMhcg2MjHlCOhu356VVZrErEM9ZafOD66sDUMluigARg1icclZaJOLhXE-7PlFRtAdk2dhXLEvRqSL3SUKrVeBy01MCmSi7sH8bkIijrujncTBU-DfWxY_JOwqhhsJAyXl0XJgjoOiGWHKcLPRRvrCbn_SGHGSw2Ogq3aC4sJLY1tBLwpgvXcOxFoURgb6y6WfJJg04H9ewyQ-Azr7kA_en7DIk_4KOux310uOWzo7XrHTxnLg/https%3A%2F%2Fpackages.debian.org%2Fbuster%2Fslurm-wlm> from the Debian apt search and compiled OpenMPI from source on my compute nodes. OpenMPI has been compiled with the option --with-slurm and the configure time log indicates openmpi has pmix v3 built in. I thought that would be enough for slurm and calling a job with "srun -n 4 -N1 executable" (with slurm.conf having MpiDefault=pmix_v3) would be enough.

Not the case, unfortunately as slurm doesn't have any idea what pmix_v3 means without being compiled against it I guess. I have also attempted to compile openmpi from source with the --with-pmi option but the slurm-wlm package doesn't install any of the libraries/headers (pmi.h pmi2.h pmix.h etc). Neither does any of the slurm-llnl develop packages, so I'm at a loss of what to do here.

A few notes: OpenMPI is working across my compute nodes. I'm able to ssh to my compute node and start a job manually with mpirun that executes successfully across the nodes. My slurmctld and slurmd daemons work for single thread resource allocation (and presumably OpenMP multithreading, though I haven't tested this).

Beyond compiling slurm from source (assuming this installs the pmi headers that I can use to build openmpi), which I have tried with no luck on my devices, is there a way to get slurm and openmpi to behave together using the precompiled package slurm-wlm?

Thank you,

~Avery Grieve
They/Them/Theirs please!
University of Michigan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201210/221180f5/attachment-0001.htm>


More information about the slurm-users mailing list