[slurm-users] Slurm-PMIx integration
rhc at pmix.org
Thu Mar 3 03:33:08 UTC 2022
Current Slurm official repository branches contain only a limited PMIx integration, primarily constrained to providing support for the traditional put-get exchange of key-value pairs. The support is also restricted to PMIx v3.x releases and below, though this restriction is due to a configure limitation as opposed to any compatibility issues with the PMIx library.
Given the emergence of new features in programming libraries such as MPI and OSHMEM that rely on more recent PMIx releases, and the sundowning of support for the older PMIx series, organizations may find themselves in need (or at least desiring) of support for PMIx releases in the v4.x and above series. Obviously, the optimal solution would be for this support to be available from the official repository and associated releases, and we are continuing to work towards that goal.
Meantime, projects involving (to one degree or another) the use of PMIx within Slurm have started. As part of their overall effort, these projects will extend the current PMIx integration to embrace the full range of PMIx operations. Much of this work will remain private pending publication, but some of the basic PMIx integration can be made available to interested organizations as it is completed.
Until these capabilities can be upstreamed, several organizations are teaming to provide two paths forward.
First, we offer a patch that can be applied to official Slurm releases that upgrades the PMIx support. The patches (https://github.com/slurm-pmix/slurm/wiki/Patches) are based on the head of the Slurm master branch and should apply cleanly to recent releases. Feedback on problems with the patch should be reported on that repository's "issues" page (https://github.com/slurm-pmix/slurm/issues). We will maintain a list of patches (each marked with a date and hash upon which they were based) as work continues on adding support for a broader range of PMIx features.
Secondly, we remind users that they can use the PMIx Reference RunTime Environment (PRRTE, https://github.com/openpmix/prrte) to resolve this issue. Once a user has obtained an allocation, simply execute prte to instantiate the persistent Distributed Virtual Machine (DVM). The PRRTE DVM contains support for the full range of PMIx features, thereby providing a complete environment for advanced features such as MPI Sessions and dynamic operations, multi-application workflows, and novel programming models such as the "sea of MPI" (to be described soon on the PRRTE site).
For ease-of-use in transitioning to PRRTE, the PMIx community is working on an "srun" personality for that environment. The base launcher for PRRTE is "prun", which has a command line similar but not identical to "srun". However, PRRTE supports customized command lines, and we are working to create such a wrapper for this environment. When complete, use of the "srun" command provided by PRRTE will behave the same as the native Slurm version of the command - but will execute the specified application using PRRTE.
It is our hope that organizations will find one (or both) of these options helpful in meeting their needs until a longer term solution is achieved.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users