[slurm-users] [EXTERNAL] problems with OpenMPI 4.0.3
Pritchard Jr., Howard
howardp at lanl.gov
Mon Jun 1 16:13:11 UTC 2020
Hello Angelines,
Do you know how the Open MPI 4.0.3 package was configured and built? That information would be useful to help diagnose the problem.
Thanks,
Howard
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of "Alberto Morillas, Angelines" <angelines.alberto at ciemat.es>
Reply-To: Slurm User Community List <slurm-users at lists.schedmd.com>
Date: Friday, May 29, 2020 at 4:25 AM
To: "slurm-users at lists.schedmd.com" <slurm-users at lists.schedmd.com>
Subject: [EXTERNAL] [slurm-users] problems with OpenMPI 4.0.3
Good morning,
We have a cluster with two kind of infiniband cards, one connectx-4 and the other connectx-6.
Openmpi-3.1.3 works fine, but when we start with connectx-6 we started to use openmpi-4.0.3 (that support connectx-6) and the programs that have several parts, first a call to a secuencial program and inside it a call to a parallel program, … (in our case the program is WRF, but we have others like this with the same problem), this kind of programs suddenly stop,
…..
0 S 4556 87383 87361 0 80 0 - 126676 hrtime ? 00:05:25 real.exe
0 S 4556 87384 87361 0 80 0 - 126677 hrtime ? 00:05:33 real.exe
0 S 4556 87385 87361 0 80 0 - 126675 hrtime ? 00:05:28 real.exe
……
The WCHAN=hrtime, and it looks that it is running, but really it doesn´t work
We don´t know if it could be problem with slurm and this version of openmpi… Any idea?
________________________________________________
Angelines Alberto Morillas
Unidad de Arquitectura Informática
Despacho: 22.1.32
Telf.: +34 91 346 6119
Fax: +34 91 346 6537
skype: angelines.alberto
CIEMAT
Avenida Complutense, 40
28040 MADRID
________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200601/e0e1cbee/attachment.htm>
More information about the slurm-users
mailing list