[slurm-users] [EXTERNAL] Re: Question about PMIX ERROR messages being emitted by some child of srun process

Pritchard Jr., Howard howardp at lanl.gov
Tue May 23 17:33:46 UTC 2023


Thanks Christopher,

This doesn't seem to be related to Open MPI at all except that for our 5.0.0 and newer one has to use PMix to talk to the job launcher.
I built MPICH 4.1 on Perlmutter using the --with-pmix option and see a similar message from srun --mpi=pmix

hpp at nid008589:~/ompi/examples> (v5.0.x *)srun -u -n 2 --mpi=pmix ./hello_c
srun: Job 9369984 step creation temporarily disabled, retrying (Requested nodes are busy)
srun: Step created for StepId=9369984.2
[nid008589:104119] PMIX ERROR: OUT-OF-RESOURCE in file base/bfrop_base_unpack.c at line 750
[nid008593:11389] PMIX ERROR: OUT-OF-RESOURCE in file base/bfrop_base_unpack.c at line 750
Hello, world, I am 0 of 2, (MPICH Version:      4.1

I too noticed that if I set PMIX_DEBUG=1 the chatter from srun stops.  

Howard


On 5/22/23, 3:49 PM, "slurm-users on behalf of Christopher Samuel" <slurm-users-bounces at lists.schedmd.com <mailto:slurm-users-bounces at lists.schedmd.com> on behalf of chris at csamuel.org <mailto:chris at csamuel.org>> wrote:


Hi Tommi, Howard,


On 5/22/23 12:16 am, Tommi Tervo wrote:


> 23.02.2 contains PMIx permission regression, it may be worth to check if it's case?


I confirmed I could replicate the UNPACK-INADEQUATE-SPACE messages 
Howard is seeing on a test system, so I tried that patch on that same 
system without any change. :-(


Looking at the PMIx code base the messages appear to come from that code 
(the triggers are in src/mca/bfrops/) and I saw I could set 
PMIX_DEBUG=verbose to get more info on the problem, but when I set that 
these messages go away entirely. :-/


Very odd.


-- 
Chris Samuel : https://urldefense.com/v3/__http://www.csamuel.org/__;!!Bt8fGhp8LhKGRg!HEanFYm_RnpHRRRiPnt-564dlqBGqhwqAIL-Bxhnyx4ulsJP12Zc4ghc32V8Pb_-SYPXWQA5oFYyfZM$ <https://urldefense.com/v3/__http://www.csamuel.org/__;!!Bt8fGhp8LhKGRg!HEanFYm_RnpHRRRiPnt-564dlqBGqhwqAIL-Bxhnyx4ulsJP12Zc4ghc32V8Pb_-SYPXWQA5oFYyfZM$> : Berkeley, CA, USA









More information about the slurm-users mailing list