Hi Josef,
on a cluster using pxe boot and automatic (re) installation of nodes, I do not think you can do this with IPoIB on an infiniband interface.
On my cluster nodes I have: - 1Gb ethernet network for OOB - 10 or 25Gb ethernet for session, automatic deployment and management - IB HDR100 for MPI
Today data storage is reached via IPoIB on one cluster and via ethernet for the second one because it is not located in the same building (old IB QDR setup). I'm working in deploying an additional ceph storage cluster and it will also require ethernet as there are no IB on these new storage nodes (eth 25Gb only)
These clusters are small (300-400 cores each). The HDR100 IB network is shared by a third cluster in the laboratory (shared purchase for the switch).
So different technologies can be required together. This represents an investment but with costs amortization over a decade or more (my QDR setup is from 2012 and still in production).
Patrick
Le 26/02/2024 à 08:59, Josef Dvoracek via slurm-users a écrit :
Just looking for some feedback, please. Is this OK? Is there a
better way?
I’m tempted to spec all new HPCs with only a high speed (200Gbps) IB
network,
Well you need Ethernet for OOB management (bmc/ipmi/ilo/whatever) anyway.. or?
cheers
josef
On 25. 02. 24 21:12, Dan Healy via slurm-users wrote:
This question is not slurm-specific, but it might develop into that.