[slurm-users] Slurm-17.11.5 + Pmix-2.1.1/Debugging

Bill Broadley bill at cse.ucdavis.edu
Tue May 8 19:59:16 MDT 2018


On 05/08/2018 05:33 PM, Christopher Samuel wrote:
> On 09/05/18 10:23, Bill Broadley wrote:
>
>> It's possible of course that it's entirely an openmpi problem, I'll
>> be investigating and posting there if I can't find a solution.
>
> One of the changes in OMPI 3.1.0 was:
>
> - Update PMIx to version 2.1.1.
>
> So I'm wondering if previous versions were falling back to PMIx v1
> support in PMIx v2 whereas now it's trying to use v2?
>
> Could explain that OPAL_PMIX_V1 definition in previous versions.
>
> What is your default MPI type?
>
> scontrol show config | fgrep MpiDefault
>
> If it's not pmix_v2 then you'll probably need to specify that, you can
> test with "srun --mpi=pmix_v2".

Ha, very good catch, thank you:

bill at headnode:~/relay$ srun --mpi=pmix_v2 -N 2 -n 2 -t 1 ./r2 1
c2-31 c2-33
size=     1,  16384 hops,  2 nodes in   0.02 sec (  1.28 us/hop)   3046 KB/sec

I updated the system wide MpiDefault and now it works great without the --mpi=

It tends to be the simple things that get me.  Thanks again.




More information about the slurm-users mailing list