[slurm-users] Slurm database failure messages

David Baker D.J.Baker at soton.ac.uk
Tue May 7 10:35:36 UTC 2019


Hello,

We are experiencing quite a number of database failures. We saw an outright failure a short while ago where we had to restart the maria database and the slurmdbd process. After restarting the database appear to be working well, however over the last few days I have notice quite a number of failures. For example -- see below. Does anyone understand what might be going wrong, why and whether we should be concerned, please? I understand that slurm databases can get quite large relatively quickly and so I wonder if this is memory related.

Best regards,
David

[root at blue51 slurm]# less slurmdbd.log-20190506.gz | grep failed
[2019-05-05T04:00:05.603] error: mysql_query failed: 1213 Deadlock found when trying to get lock; try restarting transaction
[2019-05-05T04:00:05.606] error: Cluster i5 rollup failed
[2019-05-05T23:00:07.017] error: mysql_query failed: 1213 Deadlock found when trying to get lock; try restarting transaction
[2019-05-05T23:00:07.018] error: Cluster i5 rollup failed
[2019-05-06T00:00:13.348] error: mysql_query failed: 1213 Deadlock found when trying to get lock; try restarting transaction
[2019-05-06T00:00:13.350] error: Cluster i5 rollup failed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190507/93a8aa94/attachment.html>


More information about the slurm-users mailing list