Dear slurm users,
It is my first time setting slurm up and I am looking for a solution to this errors. Has anyone here already ecountered this problem. I would really appreciate the help. mariadb, slurmdbd and slurmd are active.
*×* slurmctld.service - Slurm controller daemon
Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; *enabled*; preset: *enabled*)
Active: *failed* (Result: exit-code) since Tue 2024-06-25 10:06:39 UTC; 2min 42s ago
Duration: 584ms
Docs: man:slurmctld(8)
Process: 63738 ExecStart=/usr/sbin/slurmctld --systemd $SLURMCTLD_OPTIONS *(code=exited, status=1/FAILURE)*
Main PID: 63738 (code=exited, status=1/FAILURE)
CPU: 25ms
Jun 25 10:06:39 server systemd[1]: Starting slurmctld.service - Slurm controller daemon...
Jun 25 10:06:39 server (lurmctld)[63738]: *slurmctld.service: Referenced but unset environment variable evaluates to an empty string: SLURMCTLD_OPTIONS*
Jun 25 10:06:39 server slurmctld[63738]: slurmctld: slurmctld version 23.11.4 started on servercluster
Jun 25 10:06:39 server systemd[1]: Started slurmctld.service - Slurm controller daemon.
Jun 25 10:06:39 server slurmctld[63738]: slurmctld: accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Registering slurmctld at port 6817 with slurmdbd
Jun 25 10:06:39 server slurmctld[63738]: slurmctld: priority/multifactor: _read_last_decay_ran: No last decay (/var/spool/slurm/state/priority_last_decay_ran) to recover
Jun 25 10:06:39 server slurmctld[63738]: slurmctld: No memory enforcing mechanism configured.
Jun 25 10:06:39 server slurmctld[63738]: slurmctld: fatal: Can not recover last_conf_lite, incompatible version, (9472 not between 9728 and 10240), start with '-i' to ignore this. Warning: using -i will lose the data that can't be recovered.
Jun 25 10:06:39 server systemd[1]: *slurmctld.service: Main process exited, code=exited, status=1/FAILURE*
Jun 25 10:06:39 server systemd[1]: *slurmctld.service: Failed with result 'exit-code'.*
What's your “ Referenced but unset environment variable evaluates to an empty string: SLURMCTLD_OPTIONS* ”? Meanwhile, you can check slurmctld.log and journalctl -u slurmctld --no-pager.
Hello, slurmctld.log and journalctl -u slurmctld --no-pager give the same info as I have already provided. “ Referenced but unset environment variable evaluates to an empty string: SLURMCTLD_OPTIONS* " has to do with the files on /etc/default (slurmdbd/slurmctld/slurmd), where there is a line: SLURMDBD_OPTIONS="".
But it does not have anything to do with the fact that the deamon is not active
On Tue, Jun 25, 2024 at 3:49 PM daijiangkuicgo--- via slurm-users < slurm-users@lists.schedmd.com> wrote:
What's your “ Referenced but unset environment variable evaluates to an empty string: SLURMCTLD_OPTIONS* ”? Meanwhile, you can check slurmctld.log and journalctl -u slurmctld --no-pager.
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Hello,
I suppose the actual error is:
slurmctld: fatal: Can not recover last_conf_lite, incompatible version, (9472 not between 9728 and 10240), start with '-i' to ignore this. Warning: using -i will lose the data that can't be recovered.
did you upgrade from Slurm 21.08 (9472) to your actual version 23.11 (10240) ? See here for numbers reference: https://github.com/SchedMD/slurm/blob/40058e4df5fa243f4c340db9622ed559ce7717...
You have to stay in a 2 releases window for the upgrades to work.
Best regards, Lorenzo
On 25/06/24 16:30, stth via slurm-users wrote:
Hello, slurmctld.log and journalctl -u slurmctld --no-pager give the same info as I have already provided. “ Referenced but unset environment variable evaluates to an empty string: SLURMCTLD_OPTIONS* " has to do with the files on /etc/default (slurmdbd/slurmctld/slurmd), where there is a line: SLURMDBD_OPTIONS="".
But it does not have anything to do with the fact that the deamon is not active
On Tue, Jun 25, 2024 at 3:49 PM daijiangkuicgo--- via slurm-users slurm-users@lists.schedmd.com wrote:
What's your “ Referenced but unset environment variable evaluates to an empty string: SLURMCTLD_OPTIONS* ”? Meanwhile, you can check slurmctld.log and journalctl -u slurmctld --no-pager. -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Hello Lorenzo,
Thank you for your reply. Yes I got the 23.11.4 version.
Lorenzo Bosio lorenzo.bosio@unito.it schrieb am Di. 25. Juni 2024 um 16:50:
Hello,
I suppose the actual error is:
slurmctld: fatal: Can not recover last_conf_lite, incompatible version, (9472 not between 9728 and 10240), start with '-i' to ignore this. Warning: using -i will lose the data that can't be recovered.
did you upgrade from Slurm 21.08 (9472) to your actual version 23.11 (10240) ? See here for numbers reference: https://github.com/SchedMD/slurm/blob/40058e4df5fa243f4c340db9622ed559ce7717...
You have to stay in a 2 releases window for the upgrades to work.
Best regards, Lorenzo On 25/06/24 16:30, stth via slurm-users wrote:
Hello, slurmctld.log and journalctl -u slurmctld --no-pager give the same info as I have already provided. “ Referenced but unset environment variable evaluates to an empty string: SLURMCTLD_OPTIONS* " has to do with the files on /etc/default (slurmdbd/slurmctld/slurmd), where there is a line: SLURMDBD_OPTIONS="".
But it does not have anything to do with the fact that the deamon is not active
On Tue, Jun 25, 2024 at 3:49 PM daijiangkuicgo--- via slurm-users < slurm-users@lists.schedmd.com> wrote:
What's your “ Referenced but unset environment variable evaluates to an empty string: SLURMCTLD_OPTIONS* ”? Meanwhile, you can check slurmctld.log and journalctl -u slurmctld --no-pager.
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
On 25/06/2024 12:20, stth via slurm-users wrote:
Jun 25 10:06:39 server slurmctld[63738]: slurmctld: fatal: Can not recover last_conf_lite, incompatible version, (9472 not between 9728 and 10240), start with '-i' to ignore this. Warning: using -i will lose the data that can't be recovered.
Seems like it's not the first time, but the first time in a long while. If there is no important data in that old db, just do what the error says as a one-off.
Hi Timo,
Thanks, The old data wasn’t important so I did that. I changed the line as follows in the /usr/lib/systemd/system/slurmctld.service :
ExecStart=/usr/sbin/slurmctld --systemd -i $SLURMCTLD_OPTIONS
Slurmctld is now active
Timo Rothenpieler via slurm-users slurm-users@lists.schedmd.com schrieb am Di. 25. Juni 2024 um 17:26:
On 25/06/2024 12:20, stth via slurm-users wrote:
Jun 25 10:06:39 server slurmctld[63738]: slurmctld: fatal: Can not recover last_conf_lite, incompatible version, (9472 not between 9728 and 10240), start with '-i' to ignore this. Warning: using -i will lose the data that can't be recovered.
Seems like it's not the first time, but the first time in a long while. If there is no important data in that old db, just do what the error says as a one-off.
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
On 25.06.2024 17:54, stth via slurm-users wrote:
Hi Timo,
Thanks, The old data wasn’t important so I did that. I changed the line as follows in the /usr/lib/systemd/system/slurmctld.service : ExecStart=/usr/sbin/slurmctld --systemd -i $SLURMCTLD_OPTIONS
You should be able to immediately remove it again. I'd have probably just launched slurmctld maually via cli with -i once.