Hi all,
I am having some issue with the new version of slurm 23.11.0-1.
I had already installed and configured slurm 23.02.3-1 on my cluster and all the services were active and running properly.
After I install with the same procedure the new version of slurm I have that the slurmctld and slurmdbd daemons fail to start all with the same error:
(code=exited, status=217/USER)
And investigating the problem with the command journalctl -xe I find:
slurmctld.service: Failed to determine user credentials: No such process slurmctld.service: Failed at step USER spawning /usr/sbin/slurmctld: No such process
I had a look at the slurmctld.service file for both the slurm versions and I found the following differences in the [Service] section.
From the slurmctld.service file of slurm 23.02.3-1:
[Service] Type=simple EnvironmentFile=-/etc/sysconfig/slurmctld EnvironmentFile=-/etc/default/slurmctld ExecStart=/usr/sbin/slurmctld -D -s $SLURMCTLD_OPTIONS ExecReload=/bin/kill -HUP $MAINPID LimitNOFILE=65536 TasksMax=infinity
From the slurmctld.service file of slurm 23.11.0-1:
[Service] Type=notify EnvironmentFile=-/etc/sysconfig/slurmctld EnvironmentFile=-/etc/default/slurmctld User=slurm Group=slurm ExecStart=/usr/sbin/slurmctld --systemd $SLURMCTLD_OPTIONS ExecReload=/bin/kill -HUP $MAINPID LimitNOFILE=65536 TasksMax=infinity
I think the presence of the new lines regarding the slurm user might be the problem but I am not sure and I have no idea how to solve it.
Can anyone halp me?
Thanks in advance, Miriam
Looks like the slurm user does not exist on the system. Did you run the slurmctld and slurmdbd before as root ? If you remove the two lines (User, Group), the services will start. But is is recommended to create a dedicated slurm user for that: https://slurm.schedmd.com/quickstart_admin.html#daemons
On Fri, Jan 19, 2024, 16:02 Miriam Olmi miriam.olmi@lngs.infn.it wrote:
Hi all,
I am having some issue with the new version of slurm 23.11.0-1.
I had already installed and configured slurm 23.02.3-1 on my cluster and all the services were active and running properly.
After I install with the same procedure the new version of slurm I have that the slurmctld and slurmdbd daemons fail to start all with the same error:
(code=exited, status=217/USER)
And investigating the problem with the command journalctl -xe I find:
slurmctld.service: Failed to determine user credentials: No such process slurmctld.service: Failed at step USER spawning /usr/sbin/slurmctld: No such process
I had a look at the slurmctld.service file for both the slurm versions and I found the following differences in the [Service] section.
From the slurmctld.service file of slurm 23.02.3-1:
[Service] Type=simple EnvironmentFile=-/etc/sysconfig/slurmctld EnvironmentFile=-/etc/default/slurmctld ExecStart=/usr/sbin/slurmctld -D -s $SLURMCTLD_OPTIONS ExecReload=/bin/kill -HUP $MAINPID LimitNOFILE=65536 TasksMax=infinity
From the slurmctld.service file of slurm 23.11.0-1:
[Service] Type=notify EnvironmentFile=-/etc/sysconfig/slurmctld EnvironmentFile=-/etc/default/slurmctld User=slurm Group=slurm ExecStart=/usr/sbin/slurmctld --systemd $SLURMCTLD_OPTIONS ExecReload=/bin/kill -HUP $MAINPID LimitNOFILE=65536 TasksMax=infinity
I think the presence of the new lines regarding the slurm user might be the problem but I am not sure and I have no idea how to solve it.
Can anyone halp me?
Thanks in advance, Miriam