I have just rebuilt all my nodes and I see
Only 1 & 2 seem available?
While 3~6 are not
3's log,
[root@node3 log]# tail slurmd.log
[2024-12-08T21:45:51.250] CPU frequency setting not configured for this node
[2024-12-08T21:45:51.251] slurmd version 20.11.9 started
[2024-12-08T21:45:51.252] slurmd started on Sun, 08 Dec 2024 21:45:51 +0000
[2024-12-08T21:45:51.252] CPUs=20 Boards=1 Sockets=20 Cores=1 Threads=1 Memory=48269 TmpDisk=23324 Uptime=30 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)
[root@node3 log]#
And 7 doesnt want to talk to the controller.
[root@node7 slurm]# sinfo
slurm_load_partitions: Zero Bytes were transmitted or received
[root@node7 slurm]#
These are all rebuilt and 1~3 are identical and 4~7 are identical.
7's log keep saying,
[2024-12-08T21:49:17.246] error: Unable to register: Zero Bytes were transmitted or received
[2024-12-08T21:49:18.263] error: Unable to register: Zero Bytes were transmitted or received
[2024-12-08T21:49:19.278] error: Unable to register: Zero Bytes were transmitted or received
[2024-12-08T21:49:20.294] error: Unable to register: Zero Bytes were transmitted or received
[2024-12-08T21:49:21.310] error: Unable to register: Zero Bytes were transmitted or received
[root@vuwunicoslurmd1 slurm]# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 2 idle* node[1-2]
debug* up infinite 4 down* node[3-6]
[root@vuwunicoslurmd1 slurm]#