Hello

It might be relying to the path of pmix installation you did not set even if it's in a classical one I think you have to set it...

# --with pmix           %%_with_pmix path       require pmix support


Here is what we have in our spec file as example:

# Build with pmix by default on linux
%bcond_without pmix
%global pmix_path /opt/pmix4/4.2.2
%if %{with pmix}
BuildRequires: pmix4
Requires:pmix4
Conflicts: pmix-libpmi
    %{?with_pmix:--with-pmix=%{pmix_path}} \

Regards

Regine Gaudin



De : Patrick Begou via slurm-users <slurm-users@lists.schedmd.com>
Envoyé : mercredi 14 février 2024 18:02
À : Slurm User Community List
Objet : [slurm-users] Problem building slurm with PMIx
 
Hi !

I manage a small CentOS8 cluster using slurm  slurm-20.11.7-1 and
OpenMPI built from sources.
- I know this OS is not maintained any more and I need to negotiate
downtime to reinstall
- I know Slurm 20.11.7 has security issue (I've built it from source
some years ago with rpmbuild -ta --with mysql --with hwloc
slurm-20.11.7.tar.bz) and I should update.

All was running fine until I add a GPU Node and Nvidia sdk. This SDK
provides an openMPI3 implementation GPU aware but I'm unable to launch
an intranode parallel job with it using srun:

--------------------------------------------------------------------------
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

   version 16.05 or later: you can use SLURM's PMIx support. This
   requires that you configure and build SLURM --with-pmix.

   Versions earlier than 16.05: you must use either SLURM's PMI-1 or
   PMI-2 support. SLURM builds PMI-1 by default, or you can manually
   install PMI-2. You must then build Open MPI using --with-pmi pointing
   to the SLURM PMI library location.

Please configure as appropriate and try again.
--------------------------------------------------------------------------

I check with "srun --mpi=list" and got no pmx.
srun: MPI types are...
srun: pmi2
srun: cray_shasta
srun: none

So I decide to build the rpms from slurm-20.11.9.tar.bz2 as I had done
previously for 20.11.7 and update.
I've first installed pmix-2.1.1-1 from src as I had no pmix-devel rpm in
my local CentOS8 repo:
rpm --rebuild pmix-2.1.1-1.el8.src.rpm
dnf install pmix-devel-2.1.1-1.el8.x86_64.rpm pmix-2.1.1-1.el8.x86_64.rpm

Then build Slurm from slurm-20.11.9.tar.bz2 (just changing python3 to
python38 in the spec file)
rpmbuild -ta --with mysql --with hwloc --with pmix slurm-20.11.9.tar.bz2

And the try to install these package on the GPU node
dnf install slurm-slurmd-20.11.9-1.el8.x86_64.rpm
slurm-20.11.9-1.el8.x86_64.rpm slurm-devel-20.11.9-1.el8.x86_64.rpm
slurm-libpmi-20.11.9-1.el8.x86_64.rpm

But I get this strange error:

Error:
  Problem: conflicting requests
   - nothing provides pmix = 20.11.9 needed by
slurm-slurmd-20.11.9-1.el8.x86_64
(try to add '--skip-broken' to skip uninstallable packages or '--nobest'
to use not only best candidate packages)

Why this request on PMIX with the slurm version number ? Am I wrong
somewhere ?


Thanks for your help


Patrick


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com