Hello Daniel,

In my experience, if you have a high-speed interconnect such as IB, you would do IPoIB. You would likely still have a "regular" Ethernet connection for management purposes, and yes that means both an IB switch and an Ethernet switch, but that switch doesn't have to be anything special. Any "real" traffic is routed over IB, everything is mounted via IB, etc. That's how the last two clusters I've worked with have been configured, and the next one will be the same (but will use Omnipath rather than IB). We likewise use BeeGFS.

These next comments are perhaps more likely to encounter differences of opinion, but I would say that sufficiently fast Ethernet is often "good enough" for most workloads (e.g., MPI). I'd wager that for all but the most demanding of workloads, it's entirely acceptable. You'll also save a bit of money, of course. HOWEVER, I do think there is, shall we say, an expectation from many researchers that any cluster worth its salt will have some kind of fast interconnect, even if at the scale of most on-prem work, you might be hard-pressed in real-world conditions to notice much of a difference. If you're running jobs that take weeks and hundreds of nodes, the time (and other) savings may add up, but if we're talking the difference between a job running on 5 nodes taking 48 hours vs. slightly less, then?? Your mileage may vary, as they say...

Warmest regards,
Jason

On Sun, Feb 25, 2024 at 3:13 PM Dan Healy via slurm-users <slurm-users@lists.schedmd.com> wrote:
Hi Fellow Slurm Users,

This question is not slurm-specific, but it might develop into that. 

My question relates to understanding how typical HPCs are designed in terms of networking. To start, is it typical for there to be a high speed Ethernet and Infiniband networks (meaning separate switches, NICs)? I know you can easily setup IP over IB, but is IB usually fully reserved for MPI messages? I’m tempted to spec all new HPCs with only a high speed (200Gbps) IB network, and use IPoIB for all slurm comms with compute nodes. I plan on using BeeGFS for the file system with RDMA. 

Just looking for some feedback, please. Is this OK? Is there a better way? If yes, please share why it’s better. 

Thanks,

Daniel Healy

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com


--
Jason L. Simms, Ph.D., M.P.H.
Manager of Research Computing
Swarthmore College
Information Technology Services
(610) 328-8102
Schedule a meeting: https://calendly.com/jlsimms