Hi !
I manage a small CentOS8 cluster using slurm slurm-20.11.7-1 and OpenMPI built from sources. - I know this OS is not maintained any more and I need to negotiate downtime to reinstall - I know Slurm 20.11.7 has security issue (I've built it from source some years ago with rpmbuild -ta --with mysql --with hwloc slurm-20.11.7.tar.bz) and I should update.
All was running fine until I add a GPU Node and Nvidia sdk. This SDK provides an openMPI3 implementation GPU aware but I'm unable to launch an intranode parallel job with it using srun:
-------------------------------------------------------------------------- The application appears to have been direct launched using "srun", but OMPI was not built with SLURM's PMI support and therefore cannot execute. There are several options for building PMI support under SLURM, depending upon the SLURM version you are using:
version 16.05 or later: you can use SLURM's PMIx support. This requires that you configure and build SLURM --with-pmix.
Versions earlier than 16.05: you must use either SLURM's PMI-1 or PMI-2 support. SLURM builds PMI-1 by default, or you can manually install PMI-2. You must then build Open MPI using --with-pmi pointing to the SLURM PMI library location.
Please configure as appropriate and try again. --------------------------------------------------------------------------
I check with "srun --mpi=list" and got no pmx. srun: MPI types are... srun: pmi2 srun: cray_shasta srun: none
So I decide to build the rpms from slurm-20.11.9.tar.bz2 as I had done previously for 20.11.7 and update. I've first installed pmix-2.1.1-1 from src as I had no pmix-devel rpm in my local CentOS8 repo: rpm --rebuild pmix-2.1.1-1.el8.src.rpm dnf install pmix-devel-2.1.1-1.el8.x86_64.rpm pmix-2.1.1-1.el8.x86_64.rpm
Then build Slurm from slurm-20.11.9.tar.bz2 (just changing python3 to python38 in the spec file) rpmbuild -ta --with mysql --with hwloc --with pmix slurm-20.11.9.tar.bz2
And the try to install these package on the GPU node dnf install slurm-slurmd-20.11.9-1.el8.x86_64.rpm slurm-20.11.9-1.el8.x86_64.rpm slurm-devel-20.11.9-1.el8.x86_64.rpm slurm-libpmi-20.11.9-1.el8.x86_64.rpm
But I get this strange error:
Error: Problem: conflicting requests - nothing provides pmix = 20.11.9 needed by slurm-slurmd-20.11.9-1.el8.x86_64 (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
Why this request on PMIX with the slurm version number ? Am I wrong somewhere ?
Thanks for your help
Patrick
Hello
It might be relying to the path of pmix installation you did not set even if it's in a classical one I think you have to set it...
# --with pmix %%_with_pmix path require pmix support
Here is what we have in our spec file as example:
# Build with pmix by default on linux %bcond_without pmix %global pmix_path /opt/pmix4/4.2.2 %if %{with pmix} BuildRequires: pmix4 Requires:pmix4 Conflicts: pmix-libpmi %{?with_pmix:--with-pmix=%{pmix_path}} \
Regards
Regine Gaudin
________________________________ De : Patrick Begou via slurm-users slurm-users@lists.schedmd.com Envoyé : mercredi 14 février 2024 18:02 À : Slurm User Community List Objet : [slurm-users] Problem building slurm with PMIx
Hi !
I manage a small CentOS8 cluster using slurm slurm-20.11.7-1 and OpenMPI built from sources. - I know this OS is not maintained any more and I need to negotiate downtime to reinstall - I know Slurm 20.11.7 has security issue (I've built it from source some years ago with rpmbuild -ta --with mysql --with hwloc slurm-20.11.7.tar.bz) and I should update.
All was running fine until I add a GPU Node and Nvidia sdk. This SDK provides an openMPI3 implementation GPU aware but I'm unable to launch an intranode parallel job with it using srun:
-------------------------------------------------------------------------- The application appears to have been direct launched using "srun", but OMPI was not built with SLURM's PMI support and therefore cannot execute. There are several options for building PMI support under SLURM, depending upon the SLURM version you are using:
version 16.05 or later: you can use SLURM's PMIx support. This requires that you configure and build SLURM --with-pmix.
Versions earlier than 16.05: you must use either SLURM's PMI-1 or PMI-2 support. SLURM builds PMI-1 by default, or you can manually install PMI-2. You must then build Open MPI using --with-pmi pointing to the SLURM PMI library location.
Please configure as appropriate and try again. --------------------------------------------------------------------------
I check with "srun --mpi=list" and got no pmx. srun: MPI types are... srun: pmi2 srun: cray_shasta srun: none
So I decide to build the rpms from slurm-20.11.9.tar.bz2 as I had done previously for 20.11.7 and update. I've first installed pmix-2.1.1-1 from src as I had no pmix-devel rpm in my local CentOS8 repo: rpm --rebuild pmix-2.1.1-1.el8.src.rpm dnf install pmix-devel-2.1.1-1.el8.x86_64.rpm pmix-2.1.1-1.el8.x86_64.rpm
Then build Slurm from slurm-20.11.9.tar.bz2 (just changing python3 to python38 in the spec file) rpmbuild -ta --with mysql --with hwloc --with pmix slurm-20.11.9.tar.bz2
And the try to install these package on the GPU node dnf install slurm-slurmd-20.11.9-1.el8.x86_64.rpm slurm-20.11.9-1.el8.x86_64.rpm slurm-devel-20.11.9-1.el8.x86_64.rpm slurm-libpmi-20.11.9-1.el8.x86_64.rpm
But I get this strange error:
Error: Problem: conflicting requests - nothing provides pmix = 20.11.9 needed by slurm-slurmd-20.11.9-1.el8.x86_64 (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
Why this request on PMIX with the slurm version number ? Am I wrong somewhere ?
Thanks for your help
Patrick
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com