[slurm-users] Slurm version 17.11.0 is now available [PMIx with UCX]

Artem Polyakov artpol84 at gmail.com
Wed Nov 29 16:20:54 MST 2017

Dear friends and colleagues

On behalf of Mellanox HPC R&D I would like to emphasize a feature that we
introduced in Slurm 17.11 that has been show [1] to significantly improve
the speed and scalability of Slurm jobstart.

Starting from this release PMIx plugin supports:
(a) Direct point-to-point connections (Direct-connect) for Out-Of-Band
(OOB) communications. Prior to 17.11 it was using Slurm RPC mechanism that
is very convenient but has some performance-related issues. According to
our measurements this significantly improves Slurm/PMIx performance in the
direct-modex case [1]. By default this mode is turned on and is using
TCP-based implementation.
(b) If Slurm is configured with UCX (http://www.openucx.org/) communication
framework, PMIx plugin will use UCX-based implementation of the
(c) "Early-wireup" option to pre-connect Slurm step daemons before an
application starts using OOB channel.

The codebase was extensively tested by us internally but we need a broader
testing and looking forward hearing from you about your experience.

This implementation demonstrated good results on a small scale [1]. We are
currently working on obtaining larger-scale results and invite any
interested parties to collaborate. Please contact me through artemp at
mellanox.com if you are interested.

For testing purposes you can use our recently released jobstart project
that we are using internally for development: https://github.com/artpol84/
jobstart. It provides a convenient way to deploy as regular user a testing
Slurm instance  inside the allocation from a legacy Slurm managing the
cluster. Other good thing about this project is that it "bash-documents"
the way we configure HPC software stack and can be used as a reference.

Some technical details about those features:
1. To build with PMIx and UCX libraries you will need to explicitly
configure with both PMIx and UCX:
$ ./configure --with-pmix=<pmix-path> --with-ucx=<ucx-path>

2. You can select whether Direct-connect is enabled or not using
`SLURM_PMIX_DIRECT_CONN={true|false}` environment variable (envar) on
per-jobstep basis. By default TCP-based Direct-connect is on, if Slurm
wasn't configured with UCX.

3. If UCX support was turned on during configuration, UCX is used by
default for Direct-connect. You can control whether or not UCX is used
through `SLURM_PMIX_DIRECT_CONN_UCX={true|false}` envar. If UCX wasn't
enabled this envar is ignored.

4. To enable UCX from the very first OOB communication we added the
Early-wireup option that pre-connects UCX-based communication tree in
parallel with the local portion of MPI/OSHMEM application initialization.
By default this feature is turned off and can be controlled using
`SLURM_PMIX_DIRECT_CONN_EARLY={true | false}`. As we will get confident
with this feature we are planning to turn it on by default.

5. You may also want to specify UCX network device (i.e.
UCX_NET_DEVICES=mlx5_0:1) and the transport (UCX_TLS=dc). For now it is
recommended to use DC as a transport for the jobstart. Full RC support will
be implemented soon. Currently you have to set the global envar (like
UCX_TLS) but in the next release we will introduce prefixed envars (like
UCX_SLURM_TLS and UCX_SLURM_NET_DEVICES) for a finer grained control over
communication resource usage.

In the presentation [1] you will also find 2 backup slides explaining how
you can enable point-to-point and collectives micro-benchmarks integrated
into the PMIx plugin to get some basic reference number for the performance
on your system.
Jobstart project also contains a simple OSHMEM hello world applications
that measures oshmem_init time.

[1] Slides that was presented at Slurm booth at SC17:

Best regards, Artem Y. Polyakov
Sr. Engineer SW, Mellanox Technologies Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171129/5cc07721/attachment-0001.html>

More information about the slurm-users mailing list