[slurm-users] Problem with accounting/slurmdbd
Uwe Seher
uwe.seher at gmail.com
Wed Nov 13 09:28:55 UTC 2019
Just for completition:
There has been a lock in the database when creating a table, you can see
with
MariaDB [slurm_acct_db]> show full processlist;
+----+-------------+-----------+---------------+---------+------+---------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time |
State | Info
| Progress |
+----+-------------+-----------+---------------+---------+------+---------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 1 | system user | | NULL | Daemon | NULL |
InnoDB purge coordinator | NULL
| 0.000 |
| 3 | system user | | NULL | Daemon | NULL |
InnoDB purge worker | NULL
| 0.000 |
| 4 | system user | | NULL | Daemon | NULL |
InnoDB purge worker | NULL
| 0.000 |
| 2 | system user | | NULL | Daemon | NULL |
InnoDB purge worker | NULL
| 0.000 |
| 5 | system user | | NULL | Daemon | NULL |
InnoDB shutdown handler | NULL
| 0.000 |
| 11 | slurm | localhost | slurm_acct_db | Sleep | 943 |
| NULL
| 0.000 |
| 12 | slurm | localhost | slurm_acct_db | Sleep | 13 |
| NULL
| 0.000 |
| 20 | slurm | localhost | slurm_acct_db | Query | 307 |
Waiting for table metadata lock | create table if not exists
"mpi_ibk_event_table" (`time_start` bigint unsigned not null,
`time_end` bigint unsigned default 0 not null, `node_name` tinytext
default '' not null, `cluster_nodes` text not null default '',
`reason` tinytext not null, `reason_uid` int unsigned default
0xfffffffe not null, `state` smallint unsigned default 0 not null,
`tres` text not null default '', primary key (node_name(20),
time_start)) engine='innodb' | 0.000 |
| 22 | root | localhost | slurm_acct_db | Query | 0 | init
| show full processlist
| 0.000 |
+----+-------------+-----------+---------------+---------+------+---------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
So this produced the second issue i think. The first issue is solved
too, but it is not so clear why. My actual explanation is that I
thought, systemctl restart mysql should restart the whole server (like
postgres does ;)) but does not what it is thought to do. After a
dedicated stop - start - procedure everything works like a charm.
Thank you for your help!
Am Di., 12. Nov. 2019 um 04:45 Uhr schrieb Brian Andrus <toomuchit at gmail.com
>:
> That second one can happen as a race condition. It may be doing an update
> or running a report or what-not when you ran your command.
>
> If the issue persists, restart mysql and slurmdbd.
>
> Brian Andrus
> On 11/11/2019 2:10 AM, Uwe Seher wrote:
>
> Hello!
> I like zu use accounting via slurmdbd/mariadb and have some problems with
> connection to the database.
> When i try to connect via sacct or ascctmgr as a non-root user the
> connection is completely refused:
>
> sacctmgr: add cluster MPI_IBK
> Adding Cluster(s)
> Name = mpi_ibk
> Would you like to commit changes? (You have 30 seconds to decide)
> (N/y): y
> Problem adding clusters: Access/permission denied
>
> I think this has something to do with the second problem, when trying to use sacctmgr as root.
>
> sacctmgr: add cluster name=mpi_ibk
> Adding Cluster(s)
> Name = mpi_ibk
> Would you like to commit changes? (You have 30 seconds to decide)
> (N/y): y
>
> Database is busy or waiting for lock from other user.
>
> The first problem is caused by the lack of a configuration, as default only a user 'root' is configured in the database which can start some transactions. But for the second one i have no idea, the database is used only for slurm, i can log in with the configured user, all deamons are restarted and working fine.
> The authentication inside slurm should work with the default munge service and i think this is also working in a kind of way, because the connection seems to be established. But i can not do any configuration, so no further logging is possible. Below are some further infomations.
>
> Thank you in advance for some hints concerning this issue.
>
> Regards
>
> Uwe Seher
>
> The accounting setup in slurm.conf is the following:
>
> # ACCOUNTING
> JobAcctGatherType=jobacct_gather/linux
> JobAcctGatherFrequency=30
> # file
> JobCompType=jobcomp/filetxt
> JobCompLoc=/var/log/slurm_jobs.log
> #AccountingStorageType=accounting_storage/filetxt
> #AccountingStorageLoc=/var/log/slurm_acc.log
> #slurmdb
> AccountingStorageType=accounting_storage/slurmdbd
> AccountingStorageHost=localhost
> #AccountingStoragePass=*********
> AccountingStorageUser=slurm
>
> sacctmgr show configuration shows this:
>
> sacctmgr show configurationConfiguration data as of 2019-11-11T10:58:04AccountingStorageBackupHost = (null)AccountingStorageHost = localhostAccountingStorageLoc = N/AAccountingStoragePass = (null)AccountingStoragePort = 6819AccountingStorageType = accounting_storage/slurmdbdAccountingStorageUser = N/AAuthType = auth/mungeMessageTimeout = 10 secPluginDir = /usr/lib64/slurmPrivateData = noneSlurmUserId = slurm(400)SLURM_CONF = /etc/slurm/slurm.confSLURM_VERSION = 17.11.13TCPTimeout = 2 secTrackWCKey = 0SlurmDBD configuration:ArchiveDir = /tmpArchiveEvents = NoArchiveJobs = NoArchiveResvs = NoArchiveScript = (null)ArchiveSteps = NoArchiveSuspend = NoArchiveTXN = NoArchiveUsage = NoAuthInfo = (null)AuthType = auth/mungeBOOT_TIME = 2019-11-11T09:29:01CommitDelay = NoDbdAddr = localhostDbdBackupHost = (null)DbdHost = localhostDbdPort = 6819DebugFlags = (null)DebugLevel = verboseDebugLevelSyslog = quietDefaultQOS = (null)LogFile = /var/log/slurmdbd.logMaxQueryTimeRange = UNLIMITEDMessageTimeout = 10 secsPidFile = /var/run/slurm/slurmdbd.pidPluginDir = /usr/lib64/slurmPrivateData = nonePurgeEventAfter = NONEPurgeJobAfter = NONEPurgeResvAfter = NONEPurgeStepAfter = NONEPurgeSuspendAfter = NONEPurgeTXNAfter = NONEPurgeUsageAfter = NONESLURMDBD_CONF = /etc/slurm/slurmdbd.confSLURMDBD_VERSION = 17.11.13SlurmUser = slurm(400)StorageBackupHost = (null)StorageHost = localhostStorageLoc = slurm_acct_dbStoragePort = 3306StorageType = accounting_storage/mysqlStorageUser = slurmTCPTimeout = 2 secsTrackWCKey = NoTrackSlurmctldDown = No
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191113/911dc0cb/attachment-0001.htm>
More information about the slurm-users
mailing list