It sounds like the new version was built with different options, and/or an install was not done via packages.
If you do use rpms, you could try:
dnf provides /usr/lib64/slurm/mpi_none.so
If that shows a package that is installed, remove it. If it shows nothing, move the file elsewhere and ensure slurmd is happier.
Brian Andrus
G'Day all,
I've been upgrading cmy cluster from 20.11.0 in small steps to get to 24.05.2. Currently 1 have all nodes on 23.02.8, the controller on 24.05.2 and a single test node on 24.05.2. All are Centos 7.9 (upgrade to Oracle Linux 8.10 is Phase 2 of the upgrades).
When I check the slurmd status on the test node I get:
[root@hpc-dev-01 24.05.2]# systemctl status slurmd
● slurmd.service - Slurm node daemon
Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2024-08-15 10:45:15 AEST; 24s ago
Main PID: 46391 (slurmd)
Tasks: 1
Memory: 1.2M
CGroup: /system.slice/slurmd.service
└─46391 /usr/sbin/slurmd --systemd
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: Considering each NUMA node as a socket
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: Node reconfigured socket/core boundaries SocketsPerBoard=4:8(hw) CoresPerSocket=16:8(hw)
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: Considering each NUMA node as a socket
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: slurmd version 24.05.2 started
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: plugin_load_from_file: Incompatible Slurm plugin /usr/lib64/slurm/mpi_none.so version (23.02.8)
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: error: Couldn't load specified plugin name for mpi/none: Incompatible plugin version
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: error: MPI: Cannot create context for mpi/none
Aug 15 10:45:15 hpc-dev-01 systemd[1]: Started Slurm node daemon.
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: slurmd started on Thu, 15 Aug 2024 10:45:15 +1000
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: CPUs=64 Boards=1 Sockets=8 Cores=8 Threads=1 Memory=257778 TmpDisk=15998 Uptime=2898769 CPUSpecL...ve=(null)
Hint: Some lines were ellipsized, use -l to show in full.
[root@hpc-dev-01 24.05.2]#
We don't use MPI (life science workloads)... should I remove the file? If it is version 23.02.8 then doesn't 24.05.2 have that plugin built in? There are no references to mpi i the slurm.conf file
Sid