[slurm-users] SlurmDBD losing connection to the backend MariaDB
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Tue Nov 1 08:43:50 UTC 2022
Hi Brian,
On 11/1/22 05:28, Brian Andrus wrote:
> It caches up to a point. As I understand it, that is about an hour
> (depending on size and how busy the cluster is, as well as available
> memory, etc).
Have you found any documentation of slurmdbd caching? It's well-known
that slurmctld caches information while slurmdbd is down, see for example
page 30 in the talk "Field Notes Mark 2: Random Musings From Under A New
Hat"[1] by Tim Wickberg, SchedMD:
> For slurmdbd, the critical element in the failure domain is
> MySQL, not slurmdbd. slurmdbd itself is stateless.
> ● slurmctld will cache accounting records (up to a limit) if
> slurmdbd is unavailable. This can be hours+ to days+
> depending on your system without data loss.
The statelessness of slurmdbd makes me think that it can't cache any data.
Thanks,
Ole
[1] https://slurm.schedmd.com/publications.html
> On 10/31/2022 9:20 PM, Richard Chang wrote:
>> Hi,
>>
>> Just for my info, I would like to know what happens when SlurmDBD loses
>> connection to the backend Database, for ex, MariaDB.
>>
>> Does it cache the accounting info and keep them till the DB comes back
>> up ?, or does it panic and shut down ?
More information about the slurm-users
mailing list