[slurm-users] How to delay the start of slurmd until Infiniband/OPA network is fully up?

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Mon Oct 30 14:11:32 UTC 2023


Hi Max,

Thanks so much for your fast response with a solution!  I didn't know that 
NetworkManager (falsely) claims that the network is online as soon as the 
first interface comes up :-(

Your solution of a wait-for-interfaces Systemd service makes a lot of 
sense, and I'm going to try it out.

Best regards,
Ole

On 10/30/23 14:30, Max Rutkowski wrote:
> Hi,
> 
> we're not using Omni-Path but also had issues with Infiniband taking too 
> long and slurmd failing to start due to that.
> 
> Our solution was to implement a little wait-for-interface systemd service 
> which delays the network.target until the ib interface has come up.
> 
> Our discovery was that the network-online.target is triggered by the 
> NetworkManager as soon as the first interface is connected.
> 
> I've put the solution we use on my GitHub: 
> https://github.com/maxlxl/network.target_wait-for-interfaces
> 
> You may need to do small adjustments, but it's pretty straight forward
-- 
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: Ole.H.Nielsen at fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620 in
> general.
> 
> 
> Kind regards
> Max
> 
> On 30.10.23 13:50, Ole Holm Nielsen wrote:
>> I'm fighting this strange scenario where slurmd is started before the 
>> Infiniband/OPA network is fully up.  The Node Health Check (NHC) 
>> executed by slurmd then fails the node (as it should).  This happens 
>> only on EL8 Linux (AlmaLinux 8.8) nodes, whereas our CentOS 7.9 nodes 
>> with Infiniband/OPA network work without problems.
>>
>> Question: Does anyone know how to reliably delay the start of the slurmd 
>> Systemd service until the Infiniband/OPA network is fully up?
>>
>> Note: Our Infiniband/OPA network fabric is Omni-Path 100 Gbit/s, not 
>> Mellanox IB.  On AlmaLinux 8.8 we use the in-distro OPA drivers since 
>> the CornelisNetworks drivers are not available for RHEL 8.8.
-- 
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: Ole.H.Nielsen at fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620
>>
>> The details:
>>
>> The slurmd service is started by the service file 
>> /usr/lib/systemd/system/slurmd.service after the "network-online.target" 
>> has been reached.
>>
>> It seems that NetworkManager reports "network-online.target" BEFORE the 
>> Infiniband/OPA device ib0 is actually up, and this seems to be the cause 
>> of our problems!
>>
>> Here are some important sequences of events from the syslog showing that 
>> the network goes online before the Infiniband/OPA network (hfi1_0 
>> adapter) is up:
>>
>> Oct 30 13:01:40 d064 systemd[1]: Reached target Network is Online.
>> (lines deleted)
>> Oct 30 13:01:41 d064 slurmd[2333]: slurmd: error: health_check failed: 
>> rc:1 output:ERROR:  nhc:  Health check failed: check_hw_ib:  No IB port 
>> is ACTIVE (LinkUp 100 Gb/sec).
>> (lines deleted)
>> Oct 30 13:01:41 d064 kernel: hfi1 0000:4b:00.0: hfi1_0: 8051: Link up
>> Oct 30 13:01:41 d064 kernel: hfi1 0000:4b:00.0: hfi1_0: set_link_state: 
>> current GOING_UP, new INIT (LINKUP)
>> Oct 30 13:01:41 d064 kernel: hfi1 0000:4b:00.0: hfi1_0: physical state 
>> changed to PHYS_LINKUP (0x5), phy 0x50
>>
>> I tried to delay the NetworkManager "network-online.target" by setting a 
>> wait on the ib0 device and reboot, but that seems to be ignored:
>>
>> $ nmcli -p connection modify "System ib0" 
>> connection.connection.wait-device-timeout 20
>>
>> I'm hoping that other sites using Omni-Path have seen this and maybe can 
>> share a fix or workaround?
>>
>> Of course we could remove the Infiniband check in Node Health Check 
>> (NHC), but that would not really be acceptable during operations.
>>
>> Thanks for sharing any insights,
>> Ole
>>
> -- 
> Max Rutkowski
> IT-Services und IT-Betrieb
> Tel.: +49 (0)331/6264-2341
> E-Mail: max.rutkowski at gfz-potsdam.de
> ___________________________________
> 
> Helmholtz-Zentrum Potsdam
> *Deutsches GeoForschungsZentrum GFZ*
> Stiftung des öff. Rechts Land Brandenburg
> Telegrafenberg, 14473 Potsdam



More information about the slurm-users mailing list