[slurm-users] issue with mpirun when using through slurm / pmix
Bas van der Vlies
bas.vandervlies at surf.nl
Fri Oct 22 06:35:08 UTC 2021
I have no other solution this was the solution at our site.
> On 22 Oct 2021, at 03:19, pankajd <pankajd at cdac.in> wrote:
>
> thanks, but after setting PMIX_MCA_psec=native, now mpirun hangs and does not produce any output.
>
> On October 21, 2021 at 9:21 PM Bas van der Vlies <bas.vandervlies at surf.nl> wrote:
> > At our side we also add this problem that the pmix lib was compiled with
> > munge support. We solved it by setting this environment variable:
> > * export PMIX_MCA_psec=native of export PMIX_MCA_psec=none
> >
> > Regard,
> >
> > Bas
> >
> > On 21/10/2021 16:59, Pankaj Dorlikar wrote:
> > > Hi,
> > >
> > > When using slurm-20.11.7 compiled with pmix-3.2.3, and job is submitted
> > > like below :
> > >
> > > srun -N 1 -c 2 --pty /bin/bash
> > >
> > > on the allocated compute node, when I execute the below command, I get
> > > the PMI error with return value -46
> > >
> > > mpirun -c 2 /bin/hostname
> > >
> > > --------------------------------------------------------------------------
> > >
> > > A requested component was not found, or was unable to be opened. This
> > >
> > > means that this component is either not installed or is unable to be
> > >
> > > used on your system (e.g., sometimes this means that shared libraries
> > >
> > > that the component requires are unable to be found/loaded). Note that
> > >
> > > PMIX stopped checking at the first component that it did not find.
> > >
> > > Host: cnode9
> > >
> > > Framework: psec
> > >
> > > Component: munge
> > >
> > > --------------------------------------------------------------------------
> > >
> > > --------------------------------------------------------------------------
> > >
> > > It looks like pmix_init failed for some reason; your parallel process is
> > >
> > > likely to abort. There are many reasons that a parallel process can
> > >
> > > fail during pmix_init; some of which are due to configuration or
> > >
> > > environment problems. This failure appears to be an internal failure;
> > >
> > > here's some additional information (which may only be relevant to an
> > >
> > > PMIX developer):
> > >
> > > pmix_psec_base_open failed
> > >
> > > --> Returned value -46 instead of PMIX_SUCCESS
> > >
> > > --------------------------------------------------------------------------
> > >
> > > [cnode9:2708617] PMIX ERROR: NOT-FOUND in file server/pmix_server.c at
> > > line 237
> > >
> > >
> > > ------------------------------------------------------------------------------------------------------------
> > >
> > > [ C-DAC is on Social-Media too. Kindly follow us at:
> > > Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
> > >
> > > This e-mail is for the sole use of the intended recipient(s) and may
> > > contain confidential and privileged information. If you are not the
> > > intended recipient, please contact the sender by reply e-mail and destroy
> > > all copies and the original message. Any unauthorized review, use,
> > > disclosure, dissemination, forwarding, printing or copying of this email
> > > is strictly prohibited and appropriate legal action will be taken.
> > > ------------------------------------------------------------------------------------------------------------
> > >
> >
> > --
> > Bas van der Vlies
> > | HPCV Supercomputing | Internal Services | SURF |
> > https://userinfo.surfsara.nl |
> > | Science Park 140 | 1098 XG Amsterdam | Phone: +31208001300 |
> > | bas.vandervlies at surf.nl
>
>
> For assimilation and dissemination of knowledge, visit cakes.cdac.in
>
> ------------------------------------------------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------------------------------------------------------
More information about the slurm-users
mailing list