[slurm-users] [External] Re: openmpi / UCX / srun

Prentice Bisbal pbisbal at pppl.gov
Wed Aug 12 15:39:44 UTC 2020


Max,

You didn't quote the original e-mail so I'm not sure what the original 
problem was, or who "you" is.

--
Prentice

On 8/12/20 6:55 AM, Max Quast wrote:
>
> I am also trying to use ucx with slurm/PMIx and get the same error. 
>  Also mpirun with "--mca pml ucx" works fine.
>
> Used versions:
>
> Ubuntu 20.04
>
> slurm 20.02.4
>
> OMPI 4.0.4
>
> PMIx 3.1.5
>
> UCX 1.9.0-rc1
>
> OFED 4.9
>
> With ucx 1.8.1 I got a slightly different error:
>
> error: host1 [0] pmixp_dconn_ucx.c:245 [pmixp_dconn_ucx_prepare] 
> mpi/pmix: ERROR: Fail to init UCX: Unsupported operation
>
> [2020-08-11T20:24:48.117] [2.0] error: host1 [0] pmixp_dconn.c:72 
> [pmixp_dconn_init] mpi/pmix: ERROR: Cannot get polling fd
>
> [2020-08-11T20:24:48.117] [2.0] error: host1 [0] pmixp_server.c:402 
> [pmixp_stepd_init] mpi/pmix: ERROR: pmixp_dconn_init() failed
>
> [2020-08-11T20:24:48.117] [2.0] error: (null) [0] mpi_pmix.c:161 
> [p_mpi_hook_slurmstepd_prefork] mpi/pmix: ERROR: pmixp_stepd_init() failed
>
> [2020-08-11T20:24:48.119] [2.0] error: Failed mpi_hook_slurmstepd_prefork
>
> [2020-08-11T20:24:48.121] [2.0] error: job_manager exiting abnormally, 
> rc = -1
>
> Did you solve the problem?
>
> Greetings,
>
> Max
>
-- 
Prentice

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200812/a3607b8b/attachment.htm>


More information about the slurm-users mailing list