[slurm-users] issue with mpirun when using through slurm / pmix
pankajd
pankajd at cdac.in
Fri Oct 22 01:19:12 UTC 2021
thanks, but after setting PMIX_MCA_psec=native, now mpirun hangs and does not
produce any output.
On October 21, 2021 at 9:21 PM Bas van der Vlies <bas.vandervlies at surf.nl>
wrote:
> At our side we also add this problem that the pmix lib was compiled with
> munge support. We solved it by setting this environment variable:
> * export PMIX_MCA_psec=native of export PMIX_MCA_psec=none
>
> Regard,
>
> Bas
>
> On 21/10/2021 16:59, Pankaj Dorlikar wrote:
> > Hi,
> >
> > When using slurm-20.11.7 compiled with pmix-3.2.3, and job is submitted
> > like below :
> >
> > srun -N 1 -c 2 --pty /bin/bash
> >
> > on the allocated compute node, when I execute the below command, I get
> > the PMI error with return value -46
> >
> > mpirun -c 2 /bin/hostname
> >
> > --------------------------------------------------------------------------
> >
> > A requested component was not found, or was unable to be opened. This
> >
> > means that this component is either not installed or is unable to be
> >
> > used on your system (e.g., sometimes this means that shared libraries
> >
> > that the component requires are unable to be found/loaded). Note that
> >
> > PMIX stopped checking at the first component that it did not find.
> >
> > Host: cnode9
> >
> > Framework: psec
> >
> > Component: munge
> >
> > --------------------------------------------------------------------------
> >
> > --------------------------------------------------------------------------
> >
> > It looks like pmix_init failed for some reason; your parallel process is
> >
> > likely to abort. There are many reasons that a parallel process can
> >
> > fail during pmix_init; some of which are due to configuration or
> >
> > environment problems. This failure appears to be an internal failure;
> >
> > here's some additional information (which may only be relevant to an
> >
> > PMIX developer):
> >
> > pmix_psec_base_open failed
> >
> > --> Returned value -46 instead of PMIX_SUCCESS
> >
> > --------------------------------------------------------------------------
> >
> > [cnode9:2708617] PMIX ERROR: NOT-FOUND in file server/pmix_server.c at
> > line 237
> >
> >
> > ------------------------------------------------------------------------------------------------------------
> >
> > [ C-DAC is on Social-Media too. Kindly follow us at:
> > Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
> >
> > This e-mail is for the sole use of the intended recipient(s) and may
> > contain confidential and privileged information. If you are not the
> > intended recipient, please contact the sender by reply e-mail and destroy
> > all copies and the original message. Any unauthorized review, use,
> > disclosure, dissemination, forwarding, printing or copying of this email
> > is strictly prohibited and appropriate legal action will be taken.
> > ------------------------------------------------------------------------------------------------------------
> >
>
> --
> Bas van der Vlies
> | HPCV Supercomputing | Internal Services | SURF |
> https://userinfo.surfsara.nl |
> | Science Park 140 | 1098 XG Amsterdam | Phone: +31208001300 |
> | bas.vandervlies at surf.nl
For assimilation and dissemination of knowledge, visit cakes.cdac.in
------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211022/173a84c0/attachment-0001.htm>
More information about the slurm-users
mailing list