[slurm-users] How to delay the start of slurmd until Infiniband/OPA network is fully up?

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Thu Nov 2 08:28:28 UTC 2023


Hi Ward,

Thanks a lot for the feedback!  The method of probing 
/sys/class/infiniband/*/ports/*/state is also used in the NHC script 
lbnl_hw.nhc and has the advantage of not depending on the nmcli command 
from the NetworkManager package.

Can I ask you how you implement your script as a service in the Systemd 
booting process, perhaps similar to Max's solution in 
https://github.com/maxlxl/network.target_wait-for-interfaces ?

Thanks,
Ole

On 11/1/23 20:09, Ward Poelmans wrote:
> We have a slightly difference script to do the same. It only relies on /sys:
> 
> # Search for infiniband devices and check waits until
> # at least one reports that it is ACTIVE
> 
> if [[ ! -d /sys/class/infiniband ]]
> then
>      logger "No infiniband found"
>      exit 0
> fi
> 
> ports=$(ls /sys/class/infiniband/*/ports/*/state)
> 
> for (( count = 0; count < 300; count++ ))
> do
>      for port in ${ports}; do
>          if grep -qc ACTIVE $port; then
>              logger "Infiniband online at $port"
>              exit 0
>          fi
>      done
>      sleep 1
> done
> 
> logger "Failed to find an active infiniband interface"
> exit 1



More information about the slurm-users mailing list