[slurm-users] openmpi / UCX / srun

Max Quast max at quast.de
Wed Aug 12 10:55:13 UTC 2020


I am also trying to use ucx with slurm/PMIx and get the same error.  Also
mpirun with "--mca pml ucx" works fine.

 

Used versions: 

Ubuntu 20.04

slurm 20.02.4

OMPI 4.0.4

PMIx 3.1.5

UCX 1.9.0-rc1

OFED 4.9

 

With ucx 1.8.1 I got a slightly different error:

error: host1 [0] pmixp_dconn_ucx.c:245 [pmixp_dconn_ucx_prepare] mpi/pmix:
ERROR: Fail to init UCX: Unsupported operation

[2020-08-11T20:24:48.117] [2.0] error: host1 [0] pmixp_dconn.c:72
[pmixp_dconn_init] mpi/pmix: ERROR: Cannot get polling fd

[2020-08-11T20:24:48.117] [2.0] error: host1 [0] pmixp_server.c:402
[pmixp_stepd_init] mpi/pmix: ERROR: pmixp_dconn_init() failed

[2020-08-11T20:24:48.117] [2.0] error: (null) [0] mpi_pmix.c:161
[p_mpi_hook_slurmstepd_prefork] mpi/pmix: ERROR: pmixp_stepd_init() failed

[2020-08-11T20:24:48.119] [2.0] error: Failed mpi_hook_slurmstepd_prefork

[2020-08-11T20:24:48.121] [2.0] error: job_manager exiting abnormally, rc =
-1

 

Did you solve the problem?

 

 

Greetings,

Max

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200812/10e9b116/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5014 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200812/10e9b116/attachment-0001.bin>


More information about the slurm-users mailing list