[slurm-users] How to delay the start of slurmd until Infiniband/OPA network is fully up?

Jens Elkner jel+slurm at cs.ovgu.de
Mon Oct 30 14:52:11 UTC 2023


On Mon, Oct 30, 2023 at 03:11:32PM +0100, Ole Holm Nielsen wrote:
Hi Max & freinds,
...
> Thanks so much for your fast response with a solution!  I didn't know that
> NetworkManager (falsely) claims that the network is online as soon as the
> first interface comes up :-(

IIRC it is documented in the man page.
  
> Your solution of a wait-for-interfaces Systemd service makes a lot of sense,
> and I'm going to try it out.

Actually there is no need for such a script since
/lib/systemd/systemd-networkd-wait-online should be able to handle it.

I.e. 'Exec=/lib/systemd/systemd-networkd-wait-online -i ib0:routable'
or something like that should handle it. E.g. on my laptop the complete
/etc/systemd/system/systemd-networkd-wait-online.service looks like
this:
---schnipp---
[Unit]
Description=Wait for Network to be Configured
Documentation=man:systemd-networkd-wait-online.service(8)
DefaultDependencies=no
Conflicts=shutdown.target
Requires=systemd-networkd.service
After=systemd-networkd.service
Before=network-online.target shutdown.target

[Service]
Type=oneshot
ExecStart=/lib/systemd/systemd-networkd-wait-online -i eth0:routable -i wlan0:routable --any
RemainAfterExit=yes

[Install]
WantedBy=network-online.target
---schnapp---
 
Have fun,
jel.
> Best regards,
> Ole
> 
> On 10/30/23 14:30, Max Rutkowski wrote:
> > Hi,
> > 
> > we're not using Omni-Path but also had issues with Infiniband taking too
> > long and slurmd failing to start due to that.
> > 
> > Our solution was to implement a little wait-for-interface systemd
> > service which delays the network.target until the ib interface has come
> > up.
> > 
> > Our discovery was that the network-online.target is triggered by the
> > NetworkManager as soon as the first interface is connected.
> > 
> > I've put the solution we use on my GitHub:
> > https://github.com/maxlxl/network.target_wait-for-interfaces
> > 
> > You may need to do small adjustments, but it's pretty straight forward
> -- 
> Ole Holm Nielsen
> PhD, Senior HPC Officer
> Department of Physics, Technical University of Denmark,
> Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
> E-mail: Ole.H.Nielsen at fysik.dtu.dk
> Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
> Mobile: (+45) 5180 1620 in
> > general.
> > 
> > 
> > Kind regards
> > Max
> > 
> > On 30.10.23 13:50, Ole Holm Nielsen wrote:
> > > I'm fighting this strange scenario where slurmd is started before
> > > the Infiniband/OPA network is fully up.  The Node Health Check (NHC)
> > > executed by slurmd then fails the node (as it should).  This happens
> > > only on EL8 Linux (AlmaLinux 8.8) nodes, whereas our CentOS 7.9
> > > nodes with Infiniband/OPA network work without problems.
> > > 
> > > Question: Does anyone know how to reliably delay the start of the
> > > slurmd Systemd service until the Infiniband/OPA network is fully up?
> > > 
> > > Note: Our Infiniband/OPA network fabric is Omni-Path 100 Gbit/s, not
> > > Mellanox IB.  On AlmaLinux 8.8 we use the in-distro OPA drivers
> > > since the CornelisNetworks drivers are not available for RHEL 8.8.
> -- 
> Ole Holm Nielsen
> PhD, Senior HPC Officer
> Department of Physics, Technical University of Denmark,
> Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
> E-mail: Ole.H.Nielsen at fysik.dtu.dk
> Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
> Mobile: (+45) 5180 1620
> > > 
> > > The details:
> > > 
> > > The slurmd service is started by the service file
> > > /usr/lib/systemd/system/slurmd.service after the
> > > "network-online.target" has been reached.
> > > 
> > > It seems that NetworkManager reports "network-online.target" BEFORE
> > > the Infiniband/OPA device ib0 is actually up, and this seems to be
> > > the cause of our problems!
> > > 
> > > Here are some important sequences of events from the syslog showing
> > > that the network goes online before the Infiniband/OPA network
> > > (hfi1_0 adapter) is up:
> > > 
> > > Oct 30 13:01:40 d064 systemd[1]: Reached target Network is Online.
> > > (lines deleted)
> > > Oct 30 13:01:41 d064 slurmd[2333]: slurmd: error: health_check
> > > failed: rc:1 output:ERROR:  nhc:  Health check failed: check_hw_ib: 
> > > No IB port is ACTIVE (LinkUp 100 Gb/sec).
> > > (lines deleted)
> > > Oct 30 13:01:41 d064 kernel: hfi1 0000:4b:00.0: hfi1_0: 8051: Link up
> > > Oct 30 13:01:41 d064 kernel: hfi1 0000:4b:00.0: hfi1_0:
> > > set_link_state: current GOING_UP, new INIT (LINKUP)
> > > Oct 30 13:01:41 d064 kernel: hfi1 0000:4b:00.0: hfi1_0: physical
> > > state changed to PHYS_LINKUP (0x5), phy 0x50
> > > 
> > > I tried to delay the NetworkManager "network-online.target" by
> > > setting a wait on the ib0 device and reboot, but that seems to be
> > > ignored:
> > > 
> > > $ nmcli -p connection modify "System ib0"
> > > connection.connection.wait-device-timeout 20
> > > 
> > > I'm hoping that other sites using Omni-Path have seen this and maybe
> > > can share a fix or workaround?
> > > 
> > > Of course we could remove the Infiniband check in Node Health Check
> > > (NHC), but that would not really be acceptable during operations.
> > > 
> > > Thanks for sharing any insights,
> > > Ole
> > > 
> > -- 
> > Max Rutkowski
> > IT-Services und IT-Betrieb
> > Tel.: +49 (0)331/6264-2341
> > E-Mail: max.rutkowski at gfz-potsdam.de
> > ___________________________________
> > 
> > Helmholtz-Zentrum Potsdam
> > *Deutsches GeoForschungsZentrum GFZ*
> > Stiftung des öff. Rechts Land Brandenburg
> > Telegrafenberg, 14473 Potsdam

-- 
Otto-von-Guericke University     http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany         Tel: +49 391 67 52768



More information about the slurm-users mailing list