[slurm-users] How to delay the start of slurmd until Infiniband/OPA network is fully up?

Jens Elkner jel+slurm at cs.ovgu.de
Tue Oct 31 16:49:59 UTC 2023


On Tue, Oct 31, 2023 at 10:59:56AM +0100, Ole Holm Nielsen wrote:
Hi Ole,
  
TLTR;: below systemd-networkd stuff, only.

> On 10/30/23 20:15, Jeffrey R. Lang wrote:
> > The service is available in RHEL 8 via the EPEL package repository as system-networkd, i.e. systemd-networkd.x86_64                                           253.4-1.el8    epel
> 
> Thanks for the info.  We can install the systemd-networkd RPM from the EPEL
> repo as you suggest.

Strange, that it is not installed by default. We use Ubuntu, only. The
first LTS which includes it is Xenial (16.04) - released in April 2016.
Anyway, we have never installed any NetworkManager stuff (too unflexible,
unreliable, buggy - last eval ~5 years ago and ditched forever), even
before 16.04 as well on desktops I ditch[ed] it (IMHO just overhead).

> I tried to understand the properties of systemd-networkd before implementing
> it in our compute nodes.  While there are lots of networkd man-pages, it's
> harder to find an overview of the actual properties of networkd.  This is
> what I found:

Basically you just need for each interface a *.netdev and a *.network
file in /etc/systemd/network/.  Optionally symlink /etc/resolv.conf to
/run/systemd/resolve/resolv.conf.  If you want to rename your
interface[s] (e.g. we use ${hostname}${ifidx}), and parameter
'net.ifnames=0' gets passed to the kernel, you can use a *.link file to
accomplish this. That's it. See example 1 below.

Some distros have obscure bloatware to manage them (e.g. Ubuntu installs
per default 'netplan.io' aka another way of indirection), but we ditch
those packages immediately and manage them "manually" as needed.
 
> * Comparing systemd-networkd and NetworkManager:
> https://fedoracloud.readthedocs.io/en/latest/networkd.html

Pretty good - shows all you probably need. Actually within containers we
have just /etc/systemd/network/40-${hostname}0.network, because the
lxc.net.* config already describe, what *.link and *.netdev would do.
See example 2.
  
...
> While networkd seems to be really nifty, I hesitate to replace

Does/can do all we need w/o a lot of overhead.

> NetworkManager by networkd on our EL8 and EL9 systems because this is an
> unsupported and only lightly tested setup,

We use it ~5 years on all machines, ~7 years on most of our machines;
multihomed, containers, simple and complex (i.e. a lot of NICs, VLANs)
w/o any problems ... 

> and it may require additional
> work to keep our systems up-to-date in the future.

I doubt that. The /etc/systemd/network/*.{link,netdev,network} interface
seems to be pretty stable. Haven't seen/noticed any stuff, which got
removed so far.

> It seems to me that Max Rutkowski's solution in
> https://github.com/maxlxl/network.target_wait-for-interfaces is less
> intrusive than converting to systemd-networkd.

Depends on your setup/environment. But I guess soomer or later, you need
to get into touch with it anyway. So here some examples:

Example 1:
----------
# /etc/systemd/network/10-mb0.link
# we rename usually eth0, the 1st NIC on the motherboard to mb0 using
# its PCI Address to identify it
[Match]
Path=pci-0000:00:19.0

[Link]
Name=mb0 
MACAddressPolicy=persistent


# /etc/systemd/network/25-phys-2-vlans+vnics.network
[Match]
Name=mb0

[Link]
ARP=false

[Network]
LinkLocalAddressing=no
LLMNR=false
IPv6AcceptRA=no
LLDP=true
MACVLAN=node1_0
#VLAN=vlan2
#VLAN=vlan3


# /etc/systemd/network/40-node1_0.netdev
[NetDev]
Name=node1_0
Kind=macvlan
# Optional: we use fix mac addr on vnics
MACAddress=00:01:02:03:04:00

[MACVLAN]
Mode=bridge


# /etc/systemd/network/40-node1_0.network
[Match]
Name=node1_0

[Network]
LinkLocalAddressing=no
LLMNR=false
IPv6AcceptRA=no
LLDP=no
Address=10.11.12.13/24
Gateway=10.11.12.200
# stuff which gets copied to /run/systemd/resolve/resolv.conf, when ready
Domains=my.do.main an.other.do.main
DNS=10.11.12.100 10.11.12.101

 
Example 2 (LXC):
----------------
# /zones/n00-00/config
...
lxc.net.0.type = macvlan
lxc.net.0.macvlan.mode = bridge
lxc.net.0.flags = up
lxc.net.0.link = mb0
lxc.net.0.name = n00-00_0
lxc.net.0.hwaddr = 00:01:02:03:04:01
...


# /zones/n00-00/rootfs/etc/systemd/network/40-n00-00_0.network
[Match]
Name=n00-00_0

[Network]
LLMNR=false
LLDP=no
LinkLocalAddressing=no
IPv6AcceptRouterAdvertisements=no
Address=10.12.11.0/16
Gateway=10.12.11.2
Domains=gpu.do.main


Have fun,
jel.
> Best regards,
> Ole
> 
> 
> > -----Original Message-----
> > From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Ole Holm Nielsen
> > Sent: Monday, October 30, 2023 1:56 PM
> > To: slurm-users at lists.schedmd.com
> > Subject: Re: [slurm-users] How to delay the start of slurmd until Infiniband/OPA network is fully up?
> > 
> > ◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.
> > 
> > 
> > Hi Jens,
> > 
> > Thanks for your feedback:
> > 
> > On 30-10-2023 15:52, Jens Elkner wrote:
> > > Actually there is no need for such a script since
> > > /lib/systemd/systemd-networkd-wait-online should be able to handle it.
> > 
> > It seems that systemd-networkd exists in Fedora FC38 Linux, but not in
> > RHEL 8 and clones, AFAICT.

-- 
Otto-von-Guericke University     http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany         Tel: +49 391 67 52768



More information about the slurm-users mailing list