[slurm-users] How to delay the start of slurmd until Infiniband/OPA network is fully up?
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Mon Oct 30 14:11:32 UTC 2023
Hi Max,
Thanks so much for your fast response with a solution! I didn't know that
NetworkManager (falsely) claims that the network is online as soon as the
first interface comes up :-(
Your solution of a wait-for-interfaces Systemd service makes a lot of
sense, and I'm going to try it out.
Best regards,
Ole
On 10/30/23 14:30, Max Rutkowski wrote:
> Hi,
>
> we're not using Omni-Path but also had issues with Infiniband taking too
> long and slurmd failing to start due to that.
>
> Our solution was to implement a little wait-for-interface systemd service
> which delays the network.target until the ib interface has come up.
>
> Our discovery was that the network-online.target is triggered by the
> NetworkManager as soon as the first interface is connected.
>
> I've put the solution we use on my GitHub:
> https://github.com/maxlxl/network.target_wait-for-interfaces
>
> You may need to do small adjustments, but it's pretty straight forward
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: Ole.H.Nielsen at fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620 in
> general.
>
>
> Kind regards
> Max
>
> On 30.10.23 13:50, Ole Holm Nielsen wrote:
>> I'm fighting this strange scenario where slurmd is started before the
>> Infiniband/OPA network is fully up. The Node Health Check (NHC)
>> executed by slurmd then fails the node (as it should). This happens
>> only on EL8 Linux (AlmaLinux 8.8) nodes, whereas our CentOS 7.9 nodes
>> with Infiniband/OPA network work without problems.
>>
>> Question: Does anyone know how to reliably delay the start of the slurmd
>> Systemd service until the Infiniband/OPA network is fully up?
>>
>> Note: Our Infiniband/OPA network fabric is Omni-Path 100 Gbit/s, not
>> Mellanox IB. On AlmaLinux 8.8 we use the in-distro OPA drivers since
>> the CornelisNetworks drivers are not available for RHEL 8.8.
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: Ole.H.Nielsen at fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620
>>
>> The details:
>>
>> The slurmd service is started by the service file
>> /usr/lib/systemd/system/slurmd.service after the "network-online.target"
>> has been reached.
>>
>> It seems that NetworkManager reports "network-online.target" BEFORE the
>> Infiniband/OPA device ib0 is actually up, and this seems to be the cause
>> of our problems!
>>
>> Here are some important sequences of events from the syslog showing that
>> the network goes online before the Infiniband/OPA network (hfi1_0
>> adapter) is up:
>>
>> Oct 30 13:01:40 d064 systemd[1]: Reached target Network is Online.
>> (lines deleted)
>> Oct 30 13:01:41 d064 slurmd[2333]: slurmd: error: health_check failed:
>> rc:1 output:ERROR: nhc: Health check failed: check_hw_ib: No IB port
>> is ACTIVE (LinkUp 100 Gb/sec).
>> (lines deleted)
>> Oct 30 13:01:41 d064 kernel: hfi1 0000:4b:00.0: hfi1_0: 8051: Link up
>> Oct 30 13:01:41 d064 kernel: hfi1 0000:4b:00.0: hfi1_0: set_link_state:
>> current GOING_UP, new INIT (LINKUP)
>> Oct 30 13:01:41 d064 kernel: hfi1 0000:4b:00.0: hfi1_0: physical state
>> changed to PHYS_LINKUP (0x5), phy 0x50
>>
>> I tried to delay the NetworkManager "network-online.target" by setting a
>> wait on the ib0 device and reboot, but that seems to be ignored:
>>
>> $ nmcli -p connection modify "System ib0"
>> connection.connection.wait-device-timeout 20
>>
>> I'm hoping that other sites using Omni-Path have seen this and maybe can
>> share a fix or workaround?
>>
>> Of course we could remove the Infiniband check in Node Health Check
>> (NHC), but that would not really be acceptable during operations.
>>
>> Thanks for sharing any insights,
>> Ole
>>
> --
> Max Rutkowski
> IT-Services und IT-Betrieb
> Tel.: +49 (0)331/6264-2341
> E-Mail: max.rutkowski at gfz-potsdam.de
> ___________________________________
>
> Helmholtz-Zentrum Potsdam
> *Deutsches GeoForschungsZentrum GFZ*
> Stiftung des öff. Rechts Land Brandenburg
> Telegrafenberg, 14473 Potsdam
More information about the slurm-users
mailing list