[slurm-users] [EXTERNAL] problems with OpenMPI 4.0.3

Pritchard Jr., Howard howardp at lanl.gov
Mon Jun 1 16:13:11 UTC 2020


Hello Angelines,

Do you know how the Open MPI 4.0.3 package was configured and built?   That information would be useful to help diagnose the problem.

Thanks,

Howard


From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of "Alberto Morillas, Angelines" <angelines.alberto at ciemat.es>
Reply-To: Slurm User Community List <slurm-users at lists.schedmd.com>
Date: Friday, May 29, 2020 at 4:25 AM
To: "slurm-users at lists.schedmd.com" <slurm-users at lists.schedmd.com>
Subject: [EXTERNAL] [slurm-users] problems with OpenMPI 4.0.3

Good morning,

We have a cluster with two kind of infiniband cards, one connectx-4 and the other connectx-6.
Openmpi-3.1.3 works fine, but when we start with connectx-6 we started to use openmpi-4.0.3 (that support connectx-6) and the programs that have several parts, first a call to a secuencial program and inside it a call to a parallel program, … (in our case the program is WRF, but we have others like this with the same problem),  this kind of programs suddenly stop,

…..
0 S  4556  87383  87361  0  80   0 - 126676 hrtime ?       00:05:25 real.exe
0 S  4556  87384  87361  0  80   0 - 126677 hrtime ?       00:05:33 real.exe
0 S  4556  87385  87361  0  80   0 - 126675 hrtime ?       00:05:28 real.exe
……
The WCHAN=hrtime, and it looks that it is running, but really it doesn´t work

We don´t know if it could be  problem with slurm and this version of openmpi… Any idea?


________________________________________________

Angelines Alberto Morillas

Unidad de Arquitectura Informática
Despacho: 22.1.32
Telf.: +34 91 346 6119
Fax:   +34 91 346 6537

skype: angelines.alberto

CIEMAT
Avenida Complutense, 40
28040 MADRID
________________________________________________


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200601/e0e1cbee/attachment.htm>


More information about the slurm-users mailing list