[slurm-users] [EXTERNAL] Re: Question about PMIX ERROR messages being emitted by some child of srun process
Pritchard Jr., Howard
howardp at lanl.gov
Tue May 23 17:33:46 UTC 2023
Thanks Christopher,
This doesn't seem to be related to Open MPI at all except that for our 5.0.0 and newer one has to use PMix to talk to the job launcher.
I built MPICH 4.1 on Perlmutter using the --with-pmix option and see a similar message from srun --mpi=pmix
hpp at nid008589:~/ompi/examples> (v5.0.x *)srun -u -n 2 --mpi=pmix ./hello_c
srun: Job 9369984 step creation temporarily disabled, retrying (Requested nodes are busy)
srun: Step created for StepId=9369984.2
[nid008589:104119] PMIX ERROR: OUT-OF-RESOURCE in file base/bfrop_base_unpack.c at line 750
[nid008593:11389] PMIX ERROR: OUT-OF-RESOURCE in file base/bfrop_base_unpack.c at line 750
Hello, world, I am 0 of 2, (MPICH Version: 4.1
I too noticed that if I set PMIX_DEBUG=1 the chatter from srun stops.
Howard
On 5/22/23, 3:49 PM, "slurm-users on behalf of Christopher Samuel" <slurm-users-bounces at lists.schedmd.com <mailto:slurm-users-bounces at lists.schedmd.com> on behalf of chris at csamuel.org <mailto:chris at csamuel.org>> wrote:
Hi Tommi, Howard,
On 5/22/23 12:16 am, Tommi Tervo wrote:
> 23.02.2 contains PMIx permission regression, it may be worth to check if it's case?
I confirmed I could replicate the UNPACK-INADEQUATE-SPACE messages
Howard is seeing on a test system, so I tried that patch on that same
system without any change. :-(
Looking at the PMIx code base the messages appear to come from that code
(the triggers are in src/mca/bfrops/) and I saw I could set
PMIX_DEBUG=verbose to get more info on the problem, but when I set that
these messages go away entirely. :-/
Very odd.
--
Chris Samuel : https://urldefense.com/v3/__http://www.csamuel.org/__;!!Bt8fGhp8LhKGRg!HEanFYm_RnpHRRRiPnt-564dlqBGqhwqAIL-Bxhnyx4ulsJP12Zc4ghc32V8Pb_-SYPXWQA5oFYyfZM$ <https://urldefense.com/v3/__http://www.csamuel.org/__;!!Bt8fGhp8LhKGRg!HEanFYm_RnpHRRRiPnt-564dlqBGqhwqAIL-Bxhnyx4ulsJP12Zc4ghc32V8Pb_-SYPXWQA5oFYyfZM$> : Berkeley, CA, USA
More information about the slurm-users
mailing list