Hello
It might be relying to the path of pmix installation you did not set even if it's in a classical one I think you have to set it...
# --with pmix %%_with_pmix path require pmix support
Here is what we have in our spec file as example:
# Build with pmix by default on linux
%bcond_without pmix
%global pmix_path /opt/pmix4/4.2.2
%if %{with pmix}
BuildRequires: pmix4
Requires:pmix4
Conflicts: pmix-libpmi
%{?with_pmix:--with-pmix=%{pmix_path}} \
Regards
Regine Gaudin
De : Patrick Begou via slurm-users <slurm-users@lists.schedmd.com>
Envoyé : mercredi 14 février 2024 18:02
À : Slurm User Community List
Objet : [slurm-users] Problem building slurm with PMIx
Hi !
I manage a small CentOS8 cluster using slurm slurm-20.11.7-1 and
OpenMPI built from sources.
- I know this OS is not maintained any more and I need to negotiate
downtime to reinstall
- I know Slurm 20.11.7 has security issue (I've built it from source
some years ago with rpmbuild -ta --with mysql --with hwloc
slurm-20.11.7.tar.bz) and I should update.
All was running fine until I add a GPU Node and Nvidia sdk. This SDK
provides an openMPI3 implementation GPU aware but I'm unable to launch
an intranode parallel job with it using srun:
--------------------------------------------------------------------------
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:
version 16.05 or later: you can use SLURM's PMIx support. This
requires that you configure and build SLURM --with-pmix.
Versions earlier than 16.05: you must use either SLURM's PMI-1 or
PMI-2 support. SLURM builds PMI-1 by default, or you can manually
install PMI-2. You must then build Open MPI using --with-pmi pointing
to the SLURM PMI library location.
Please configure as appropriate and try again.
--------------------------------------------------------------------------
I check with "srun --mpi=list" and got no pmx.
srun: MPI types are...
srun: pmi2
srun: cray_shasta
srun: none
So I decide to build the rpms from slurm-20.11.9.tar.bz2 as I had done
previously for 20.11.7 and update.
I've first installed pmix-2.1.1-1 from src as I had no pmix-devel rpm in
my local CentOS8 repo:
rpm --rebuild pmix-2.1.1-1.el8.src.rpm
dnf install pmix-devel-2.1.1-1.el8.x86_64.rpm pmix-2.1.1-1.el8.x86_64.rpm
Then build Slurm from slurm-20.11.9.tar.bz2 (just changing python3 to
python38 in the spec file)
rpmbuild -ta --with mysql --with hwloc --with pmix slurm-20.11.9.tar.bz2
And the try to install these package on the GPU node
dnf install slurm-slurmd-20.11.9-1.el8.x86_64.rpm
slurm-20.11.9-1.el8.x86_64.rpm slurm-devel-20.11.9-1.el8.x86_64.rpm
slurm-libpmi-20.11.9-1.el8.x86_64.rpm
But I get this strange error:
Error:
Problem: conflicting requests
- nothing provides pmix = 20.11.9 needed by
slurm-slurmd-20.11.9-1.el8.x86_64
(try to add '--skip-broken' to skip uninstallable packages or '--nobest'
to use not only best candidate packages)
Why this request on PMIX with the slurm version number ? Am I wrong
somewhere ?
Thanks for your help
Patrick
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com