[slurm-users] Can't start slurmdbd
Juan A. Cordero Varelaq
bioinformatica-ibis at us.es
Mon Nov 20 02:50:48 MST 2017
Hi,
Slurm 17.02.3 was installed on my cluster some time ago but recently I
decided to use SlurmDBD for the accounting.
After installing several packages (slurm-devel, slurm-munge,
slurm-perlapi, slurm-plugins, slurm-slurmdbd and slurm-sql) and MariaDB
in CentOS 7, I created an SQL database:
mysql> grant all on slurm_acct_db.* TO 'slurm'@'localhost'
-> identified by 'some_pass' with grant option;
mysql> create database slurm_acct_db;
and configured the slurmdbd.conf file:
AuthType=auth/munge
DbdAddr=localhost
DbdHost=localhost
SlurmUser=slurm
DebugLevel=4
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
StorageType=accounting_storage/mysql
StorageHost=localhost
StoragePass=some_pass
StorageUser=slurm
StorageLoc=slurm_acct_db
Then, I stopped the slurmctl daemon on the head node of my cluster and
tried to start `slurmdbd`, but I got the following:
$ systemctl start slurmdbd
Job for slurmdbd.service failed because the control process exited
with error code. See "systemctl status slurmdbd.service" and "journalctl
-xe" for details.
$ systemctl status slurmdbd.service
● slurmdbd.service - Slurm DBD accounting daemon
Loaded: loaded (/etc/systemd/system/slurmdbd.service; enabled;
vendor preset: disabled)
Active: failed (Result: exit-code) since lun 2017-11-20 10:39:26
CET; 53s ago
Process: 27592 ExecStart=/usr/sbin/slurmdbd $SLURMDBD_OPTIONS
(code=exited, status=1/FAILURE)
nov 20 10:39:26 login_node systemd[1]: Starting Slurm DBD
accounting daemon...
nov 20 10:39:26 login_node systemd[1]: slurmdbd.service: control
process exited, code=exited status=1
nov 20 10:39:26 login_node systemd[1]: Failed to start Slurm DBD
accounting daemon.
nov 20 10:39:26 login_node systemd[1]: Unit slurmdbd.service
entered failed state.
nov 20 10:39:26 login_node systemd[1]: slurmdbd.service failed.
$ journalctl -xe
nov 20 10:39:26 login_node polkitd[1078]: Registered Authentication
Agent for unix-process:27586:119889015 (system bus name :1.871
[/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /or
nov 20 10:39:26 login_node systemd[1]: Starting Slurm DBD
accounting daemon...
-- Subject: Unit slurmdbd.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit slurmdbd.service has begun starting up.
nov 20 10:39:26 login_node systemd[1]: slurmdbd.service: control
process exited, code=exited status=1
nov 20 10:39:26 login_node systemd[1]: Failed to start Slurm DBD
accounting daemon.
-- Subject: Unit slurmdbd.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit slurmdbd.service has failed.
--
-- The result is failed.
nov 20 10:39:26 login_node systemd[1]: Unit slurmdbd.service
entered failed state.
nov 20 10:39:26 login_node systemd[1]: slurmdbd.service failed.
nov 20 10:39:26 login_node polkitd[1078]: Unregistered
Authentication Agent for unix-process:27586:119889015 (system bus name
:1.871, object path /org/freedesktop/PolicyKit1/AuthenticationAgent,
nov 20 10:40:06 login_node gmetad[1519]: data_thread() for [HPCSIE]
failed to contact node 192.168.2.10
nov 20 10:40:06 login_node gmetad[1519]: data_thread() got no
answer from any [HPCSIE] datasource
nov 20 10:40:13 login_node dhcpd[2320]: DHCPREQUEST for
192.168.2.19 from XX:XX:XX:XX:XX:XX via enp6s0f1
nov 20 10:40:13 login_node dhcpd[2320]: DHCPACK on 192.168.2.19 to
XX:XX:XX:XX:XX:XX via enp6s0f1
nov 20 10:40:39 login_node dhcpd[2320]: DHCPREQUEST for
192.168.2.13 from XX:XX:XX:XX:XX:XX via enp6s0f1
nov 20 10:40:39 login_node dhcpd[2320]: DHCPACK on 192.168.2.13 to
XX:XX:XX:XX:XX:XX via enp6s0f1
I've just found out the file `/var/run/slurmdbd.pid` does not even exist.
I'd appreciate any hint on this issue.
Thanks
More information about the slurm-users
mailing list