[slurm-users] PMix3 Plugin+ openMPI 4.1.5 broken for heterogenous jobs with SLURM v 21.08.8-2

Bertini, Denis Dr. D.Bertini at gsi.de
Tue Jun 20 06:37:13 UTC 2023


I made some progress trying to understand the problem i reported some weeks ago:


I noticed that the intermittent connection timeout that i am experiencing occurs only

when using the tcp based direct connection to establish communication between stepd

on different nodes.

When disabling the optimized direct connection using


the submission of hetjobs is stable and not

connection timeout occurs anymore.

Any idea what can goes wrong when using tcp based direct connection together with hetjobs?


Denis Bertini
Abteilung: CIT
Ort: SB3 2.265a

Tel: +49 6159 71 2240
Fax: +49 6159 71 2986
E-Mail: d.bertini at gsi.de

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the GSI Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
Ministerialdirigent Dr. Volkmar Dietz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230620/25abb4a2/attachment.htm>

More information about the slurm-users mailing list