[slurm-users] SlurmDBD losing connection to the backend MariaDB

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Tue Nov 1 08:43:50 UTC 2022

Hi Brian,

On 11/1/22 05:28, Brian Andrus wrote:
> It caches up to a point. As I understand it, that is about an hour 
> (depending on size and how busy the cluster is, as well as available 
> memory, etc).

Have you found any documentation of slurmdbd caching?  It's well-known 
that slurmctld caches information while slurmdbd is down, see for example 
page 30 in the talk "Field Notes Mark 2: Random Musings From Under A New 
Hat"[1] by Tim Wickberg, SchedMD:

> For slurmdbd, the critical element in the failure domain is
> MySQL, not slurmdbd. slurmdbd itself is stateless.
> ● slurmctld will cache accounting records (up to a limit) if
> slurmdbd is unavailable. This can be hours+ to days+
> depending on your system without data loss.

The statelessness of slurmdbd makes me think that it can't cache any data.


[1] https://slurm.schedmd.com/publications.html

> On 10/31/2022 9:20 PM, Richard Chang wrote:
>> Hi,
>> Just for my info, I would like to know what happens when SlurmDBD loses 
>> connection to the backend Database, for ex, MariaDB.
>> Does it cache the accounting info and keep them till the DB comes back 
>> up ?, or does it panic and shut down ?

More information about the slurm-users mailing list