Hi All,
We're currently in the process of setting up SLURM on a RHEL 8.9 based cluster. Here's a summary of the steps we've taken so far:
Installed MLNX OFED ConnectX-5.2. Compiled and installed PMiX and UCX. Compiled and installed Slurm with PMiX_v4 and UCX support. Compiled OpenMPI with SLURM, PMIx, libevent, and hwloc support. All compute nodes are reachable via the IB network.
*Problem:* While hello world MPI jobs are working fine on multiple nodes, the jobs are not utilizing Infiniband.
srun --mpi=pmix -N2 -n2 --ntasks-per-node=2 ./hello > log.out 2>&1
Output from srun --mpi=list:
MPI plugin types are... none cray_shasta pmi2 pmix specific pmix plugin versions available: pmix_v4
Could someone please point me in the right direction on how to troubleshoot this issue?
Thank you for your assistance.
Sudhakar