[slurm-users] Intermittent problem at 32 CPUs
Diego Zuccato
diego.zuccato at unibo.it
Tue Jun 9 08:36:53 UTC 2020
Il 08/06/20 12:16, Diego Zuccato ha scritto:
> I have another partition on these new nodes. 4 identical machines, new
> installation, ConnectX-5 card, dual Intel Xeon 5120 (14 core dual
> thread). No problem running a job requiring 112 threads (on 4 nodes),
> but can't run a single-node job with 56 threads.
Well, actually I pinned down the problem to *one* of the four new nodes
(mtx-01).
Launching the test code on 56 threads always failed.
Once I installed gdb package to be able to debug it, the problem
disappeared! Even if I don't use gdb!
... and there is who says that gdb is not a great debugger: it catches
bugs by just being there, even if you don't use it! :)
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
More information about the slurm-users
mailing list