G'Day all,

I've been upgrading cmy cluster from 20.11.0 in small steps to get to 24.05.2. Currently 1 have all nodes on 23.02.8, the controller on 24.05.2 and a single test node on 24.05.2. All are Centos 7.9 (upgrade to Oracle Linux 8.10 is Phase 2 of the upgrades).

When I check the slurmd status on the test node I get:

[root@hpc-dev-01 24.05.2]# systemctl status slurmd
● slurmd.service - Slurm node daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2024-08-15 10:45:15 AEST; 24s ago
 Main PID: 46391 (slurmd)
    Tasks: 1
   Memory: 1.2M
   CGroup: /system.slice/slurmd.service
           └─46391 /usr/sbin/slurmd --systemd

Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: Considering each NUMA node as a socket
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: Node reconfigured socket/core boundaries SocketsPerBoard=4:8(hw) CoresPerSocket=16:8(hw)
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: Considering each NUMA node as a socket
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: slurmd version 24.05.2 started
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: plugin_load_from_file: Incompatible Slurm plugin /usr/lib64/slurm/mpi_none.so version (23.02.8)
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: error: Couldn't load specified plugin name for mpi/none: Incompatible plugin version
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: error: MPI: Cannot create context for mpi/none
Aug 15 10:45:15 hpc-dev-01 systemd[1]: Started Slurm node daemon.
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: slurmd started on Thu, 15 Aug 2024 10:45:15 +1000
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: CPUs=64 Boards=1 Sockets=8 Cores=8 Threads=1 Memory=257778 TmpDisk=15998 Uptime=2898769 CPUSpecL...ve=(null)
Hint: Some lines were ellipsized, use -l to show in full.
[root@hpc-dev-01 24.05.2]#

We don't use MPI (life science workloads)... should I remove the file? If it is version 23.02.8 then doesn't 24.05.2 have that plugin built in? There are no references to mpi i the slurm.conf file



Sid