[slurm-users] Intermittent problem at 32 CPUs

Diego Zuccato diego.zuccato at unibo.it
Mon Jun 8 10:16:35 UTC 2020


Il 07/06/20 09:44, Diego Zuccato ha scritto:

>> I'm *guessing* that you are tripping over the use of "--tasks 32" on a heterogeneous cluster,
> If you mean that using "--tasks 32" trips the use of a second node, then
> no. The node does have two AMD Opteron 6274 .
[...]
> I've had a similar problem while adding new nodes in a new partition. I
> "solved" (probably) by adding a line
> mtl = psm2
> to /etc/openmpi/openmpi-mca-params.conf .
> But those were nodes with IB.
Update: it's not resolved on these nodes either. :(

I have another partition on these new nodes. 4 identical machines, new
installation, ConnectX-5 card, dual Intel Xeon 5120 (14 core dual
thread). No problem running a job requiring 112 threads (on 4 nodes),
but can't run a single-node job with 56 threads.

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786



More information about the slurm-users mailing list