[slurm-users] How to delay the start of slurmd until Infiniband/OPA network is fully up?
Ward Poelmans
ward.poelmans at vub.be
Wed Nov 1 19:09:15 UTC 2023
Hi,
We have a slightly difference script to do the same. It only relies on /sys:
# Search for infiniband devices and check waits until
# at least one reports that it is ACTIVE
if [[ ! -d /sys/class/infiniband ]]
then
logger "No infiniband found"
exit 0
fi
ports=$(ls /sys/class/infiniband/*/ports/*/state)
for (( count = 0; count < 300; count++ ))
do
for port in ${ports}; do
if grep -qc ACTIVE $port; then
logger "Infiniband online at $port"
exit 0
fi
done
sleep 1
done
logger "Failed to find an active infiniband interface"
exit 1
Ward
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4745 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231101/7b61d361/attachment.bin>
More information about the slurm-users
mailing list