[slurm-users] How to delay the start of slurmd until Infiniband/OPA network is fully up?

Ward Poelmans ward.poelmans at vub.be
Wed Nov 1 19:09:15 UTC 2023


Hi,

We have a slightly difference script to do the same. It only relies on /sys:

# Search for infiniband devices and check waits until
# at least one reports that it is ACTIVE

if [[ ! -d /sys/class/infiniband ]]
then
     logger "No infiniband found"
     exit 0
fi

ports=$(ls /sys/class/infiniband/*/ports/*/state)

for (( count = 0; count < 300; count++ ))
do
     for port in ${ports}; do
         if grep -qc ACTIVE $port; then
             logger "Infiniband online at $port"
             exit 0
         fi
     done
     sleep 1
done

logger "Failed to find an active infiniband interface"
exit 1


Ward
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4745 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231101/7b61d361/attachment.bin>


More information about the slurm-users mailing list