Steven,
Looks like you may have had a secondary controller that took over and changed your StateSave files.
IF you don't need the job info AND no jobs are running, you can just rename/delete your StateSaveLocation directory and things will be recreated. Job numbers will start over (unless you set FirstJobId, which you should if you want to keep your sacct data).
It also looks like your logging does not have permissions. Change SlurmctldLogFile to be something like /var/log/slurm/slurmctld.log and set the owner of /var/log/slurm to the slurm user.
Ensure all slurmctld daemons are down, then start the first. Once it is up (you can run scontrol show config) start the second. Run 'scontrol show config' again and you should see both daemons listed as 'up at the end of the output.
-Brian Andrus
On 2/3/2025 7:29 PM, Steven Jones via slurm-users wrote:
From the logs 2 errors,
8><--- Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz systemd[1]: Starting Slurm controller daemon... Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz slurmctld[1045020]: slurmctld: error: chdir(/var/log): Permission denied Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz slurmctld[1045020]: slurmctld: slurmctld version 24.11.1 started on cluster poc-cluster(2175) Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz systemd[1]: Started Slurm controller daemon. Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz slurmctld[1045020]: slurmctld: fatal: Can not recover assoc_usage state, incompatible version, got 9728 need >= 9984 <= 10752, start with '-i' to ignore this. Warning: using -i will lose the data that can't be recovered. Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz systemd[1]: slurmctld.service: Failed with result 'exit-code'.
No idea on "slurmctld: error: chdir(/var/log): Permission denied" need more info but the log seems to be written to OK as we can see.
"fatal: Can not recover assoc_usage state, incompatible version,"
This seems to be me attempting to upgrade from ver22 to ver24 but google tells me ver22 "left a mess" and ver24 cant cope. Where would I go looking to clean up please?
regards
Steven