[slurm-users] SLURMDBD fails trying to talk to MariaDB - Help debugging configuration
Aravindh Sampathkumar
aravindh at fastmail.com
Thu Oct 11 14:58:57 MDT 2018
Hello.
I'm trying to setup a SLURM cluster in a virtual environment before
actually deploying it for serious work. I hit a snag where Slurmdbd
fails soon after starting because of trouble connecting to MariaDB.
SlurmDBD service status:
[root at slmaster ~]# systemctl status slurmdbd
*●* slurmdbd.service - Slurm DBD accounting daemon
Loaded: loaded (/etc/systemd/system/slurmdbd.service; enabled; vendor
preset: disabled) Active: *failed* (Result: timeout) since Thu 2018-10-11 20:34:42
UTC; 14min ago Process: 1406 ExecStart=/usr/sbin/slurmdbd $SLURMDBD_OPTIONS
(code=exited, status=0/SUCCESS)
Oct 11 20:33:11 slmaster systemd[1]: Starting Slurm DBD
accounting daemon...Oct 11 20:33:11 slmaster systemd[1]: PID file /var/run/slurmdbd.pid not
readable (yet?) after start.Oct 11 20:34:42 slmaster systemd[1]: *slurmdbd.service start operation
timed out. Terminating.*Oct 11 20:34:42 slmaster systemd[1]: *Failed to start Slurm DBD
accounting daemon.*Oct 11 20:34:42 slmaster systemd[1]: *Unit slurmdbd.service entered
failed state.*Oct 11 20:34:42 slmaster systemd[1]: *slurmdbd.service failed.*
MariaDB running just fine:
[root at slmaster ~]# systemctl status mariadb
*●* mariadb.service - MariaDB database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled;
vendor preset: disabled) Active: *active (running)* since Thu 2018-10-11 20:33:11 UTC; 18min
ago Process: 991 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID
(code=exited, status=0/SUCCESS) Process: 943 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n
(code=exited, status=0/SUCCESS) Main PID: 989 (mysqld_safe)
CGroup: /system.slice/mariadb.service
├─ 989 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
└─1265 /usr/libexec/mysqld --basedir=/usr --
datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin
--log-error=/var/log/mariadb/maria...
Oct 11 20:33:07 slmaster systemd[1]: Starting MariaDB database server...Oct 11 20:33:07 slmaster mariadb-prepare-db-dir[943]: Database MariaDB
is probably initialized in /var/lib/mysql already, nothing is done.Oct 11 20:33:08 slmaster mysqld_safe[989]: 181011 20:33:08 mysqld_safe
Logging to '/var/log/mariadb/mariadb.log'.Oct 11 20:33:09 slmaster mysqld_safe[989]: 181011 20:33:09 mysqld_safe
Starting mysqld daemon with databases from /var/lib/mysqlOct 11 20:33:11 slmaster systemd[1]: Started MariaDB database server.
Logfile from Slurmdbd:
[2018-10-11T20:33:11.648] debug: Log file re-opened
[2018-10-11T20:33:11.720] debug: Munge authentication plugin loaded
[2018-10-11T20:33:11.749] debug2: mysql_connect() called for db
slurm_acct_db[2018-10-11T20:33:11.785] debug2: innodb_buffer_pool_size: 629145600
[2018-10-11T20:33:11.785] debug2: innodb_log_file_size: 67108864
[2018-10-11T20:33:11.786] debug2: innodb_lock_wait_timeout: 900
[2018-10-11T20:33:11.956] Accounting storage MYSQL plugin loaded
[2018-10-11T20:33:11.958] debug2: ArchiveDir = /tmp
[2018-10-11T20:33:11.958] debug2: ArchiveScript = (null)
[2018-10-11T20:33:11.958] debug2: AuthInfo = (null)
[2018-10-11T20:33:11.958] debug2: AuthType = auth/munge
[2018-10-11T20:33:11.958] debug2: CommitDelay = 0
[2018-10-11T20:33:11.958] debug2: DbdAddr = slmaster
[2018-10-11T20:33:11.958] debug2: DbdBackupHost = (null)
[2018-10-11T20:33:11.958] debug2: DbdHost = slmaster
[2018-10-11T20:33:11.958] debug2: DbdPort = 6819
[2018-10-11T20:33:11.958] debug2: DebugFlags = (null)
[2018-10-11T20:33:11.958] debug2: DebugLevel = 6
[2018-10-11T20:33:11.958] debug2: DebugLevelSyslog = 10
[2018-10-11T20:33:11.958] debug2: DefaultQOS = (null)
[2018-10-11T20:33:11.958] debug2: LogFile =
/var/log/slurm/slurmdbd.log[2018-10-11T20:33:11.958] debug2: MessageTimeout = 10
[2018-10-11T20:33:11.958] debug2: Parameters = (null)
[2018-10-11T20:33:11.958] debug2: PidFile =
/var/spool/slurm/slurmdbd.pid[2018-10-11T20:33:11.958] debug2: PluginDir = /usr/lib64/slurm[2018-10-11T20:33:11.958] debug2: PrivateData = none
[2018-10-11T20:33:11.958] debug2: PurgeEventAfter = NONE
[2018-10-11T20:33:11.958] debug2: PurgeJobAfter = NONE
[2018-10-11T20:33:11.958] debug2: PurgeResvAfter = NONE
[2018-10-11T20:33:11.958] debug2: PurgeStepAfter = NONE
[2018-10-11T20:33:11.958] debug2: PurgeSuspendAfter = NONE
[2018-10-11T20:33:11.958] debug2: PurgeTXNAfter = NONE
[2018-10-11T20:33:11.958] debug2: PurgeUsageAfter = NONE
[2018-10-11T20:33:11.958] debug2: SlurmUser = slurm(982)
[2018-10-11T20:33:11.958] debug2: StorageBackupHost = (null)
[2018-10-11T20:33:11.958] debug2: StorageHost = localhost
[2018-10-11T20:33:11.958] debug2: StorageLoc = slurm_acct_db
[2018-10-11T20:33:11.958] debug2: StoragePort = 3306
[2018-10-11T20:33:11.958] debug2: StorageType =
accounting_storage/mysql[2018-10-11T20:33:11.958] debug2: StorageUser = slurm
[2018-10-11T20:33:11.958] debug2: TCPTimeout = 2
[2018-10-11T20:33:11.958] debug2: TrackWCKey = 0
[2018-10-11T20:33:11.958] debug2: TrackSlurmctldDown= 0
[2018-10-11T20:33:11.958] debug2: acct_storage_p_get_connection: request
new connection 1[2018-10-11T20:33:11.974] slurmdbd version 18.08.1 started
[2018-10-11T20:33:11.986] debug2: running rollup at Thu Oct 11
20:33:11 2018[2018-10-11T20:33:11.986] debug2: Everything rolled up
[2018-10-11T20:34:42.968] Terminate signal (SIGINT or SIGTERM) received[2018-10-11T20:34:42.969] debug: rpc_mgr shutting down
sacctmgr says:
[root at slmaster ~]# sacctmgr -vvvv
sacctmgr: Accounting storage SLURMDBD plugin loaded
sacctmgr: debug2: slurm_connect failed: Connection refused
sacctmgr: debug2: Error connecting slurm stream socket at 127.0.0.1:6819: Connection refusedsacctmgr: error: slurm_persist_conn_open_without_init: failed to open persistent connection to slmaster:6819: Connection refusedsacctmgr: error: slurmdbd: Sending PersistInit msg: Connection refusedsacctmgr: error: Problem talking to the database: Connection refused
I am able to connect to MariaDB locally using the above settings...
[root at slmaster ~]# mysql -u slurm -h localhost -P 3306 -p
Enter password:
*Welcome to the MariaDB monitor. Commands end with ; or \g.*
*Your MariaDB connection id is 7*
*Server version: 5.5.60-MariaDB MariaDB Server*
*Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.*
*Type 'help;' or '\h' for help. Type '\c' to clear the current input
statement.*
MariaDB [(none)]>
Config files are all attached. slurm.conf, slurmdbd.conf
Mariadb configuration was not changed..
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted
# security riskssymbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
#
# include all files from the config directory
#
!includedir /etc/my.cnf.d
Appreciate any help troubleshooting the "Connection refused" error..
Thanks,
--
Aravindh Sampathkumar
aravindh at fastmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181011/b54742c8/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: slurm.conf
Type: application/octet-stream
Size: 3339 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181011/b54742c8/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: slurmdbd.conf
Type: application/octet-stream
Size: 742 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181011/b54742c8/attachment-0003.obj>
More information about the slurm-users
mailing list