[slurm-users] Deadlocks in slurmdbd logs

Wed Jun 19 11:34:33 UTC 2019

Hello,

Everyday we see several deadlocks in our slurmdbd log file. Together with the deadlock we always see a failed "roll up" operation. Please see below for an example.

We are running slurm 18.08.0 on our cluster. As far as we know these deadlocks are not adversely affecting the operation of the cluster. Each day jobs are "rolling" through the cluster and the utilisation of the cluster is constantly high. Furthermore, it doesn't appear that we are losing data in the database. I'm not a database expert and so I have no idea where to start with this. Our local db experts have taken a look and are nonplussed.

I wondered if anyone in the community had any ideas please. As an aside I've just started to experiment with v19* and it would be nice to think that these deadlocks will just go away in due course (following an eventual upgrade when that version is a bit more mature), however that may not be the case.

Best regards,

David

[2019-06-19T00:00:02.728] error: mysql_query failed: 1213 Deadlock found when trying to get lock; try restarting transaction
insert into "i5_assoc_usage_hour_table"
.....

[2019-06-19T00:00:02.729] error: Couldn't add assoc hour rollup
[2019-06-19T00:00:02.729] error: Cluster i5 rollup failed

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190619/74d30058/attachment-0001.html>