[slurm-users] MPI_Init_thread error

Mon Jul 24 17:41:59 UTC 2023

Hi Aziz,

This seems like an MPI environment issue rather than a Slurm problem.

Make sure that MPI modules are loaded as well. You can see the list of
loaded modules via `module list`. This should give you if SU2 dependencies
are available in your runtime. If they are not loaded implicitly, you need
to load them before you load SU2. You can then check with commands like
`which mpirun` or `mpirun -V` to see if you have proper mpi runtime env.

By the way, even if your case runs fine, you won't be able to benefit from
mpi because you're allocating a single process (--ntasks-per-node=1).
Instead, get the whole node and use all physical cores (or run a
scalability analysis and make a decision after that).

Hope this helps

Fatih

On Mon, Jul 24, 2023 at 10:44 AM Aziz Ogutlu <aziz.ogutlu at eduline.com.tr>
wrote:

> Hi there all,
> We're using Slurm 21.08 on Redhat 7.9 HPC cluster with OpenMPI 4.0.3 + gcc
> 8.5.0.
> When we run command below for call SU2, we get an error message:
>
> *$ srun -p defq --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash
> -i*
> *$ module load su2/7.5.1*
> *$ SU2_CFD config.cfg*
>
> **** An error occurred in MPI_Init_thread*
> **** on a NULL communicator*
> **** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,*
> ****    and potentially your MPI job)*
> *[cnode003.hpc:17534] Local abort before MPI_INIT completed completed
> successfully, but am not able to aggregate error messages, and not able to
> guarantee that all other processes were killed!*
>
> --
> Best regards,
> Aziz Öğütlü
>
> Eduline Bilişim Sanayi ve Ticaret Ltd. Şti.  www.eduline.com.tr
> Merkez Mah. Ayazma Cad. No:37 Papirus Plaza
> Kat:6 Ofis No:118 Kağıthane -  İstanbul - Türkiye 34406
> Tel : +90 212 324 60 61     Cep: +90 541 350 40 72
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230724/9fb1b2b6/attachment-0001.htm>