[slurm-users] SlurmDBD losing connection to the backend MariaDB
Brian Andrus
toomuchit at gmail.com
Tue Nov 1 20:30:54 UTC 2022
Ole,
Fair enough, it is actually slurmctld that does the caching. Technical
typo on my part there.
Just trying to let the user know, there is a window that they have to
ensure no information is lost during a database outage.
Brian Andrus
On 11/1/2022 1:43 AM, Ole Holm Nielsen wrote:
> Hi Brian,
>
> On 11/1/22 05:28, Brian Andrus wrote:
>> It caches up to a point. As I understand it, that is about an hour
>> (depending on size and how busy the cluster is, as well as available
>> memory, etc).
>
> Have you found any documentation of slurmdbd caching? It's well-known
> that slurmctld caches information while slurmdbd is down, see for
> example page 30 in the talk "Field Notes Mark 2: Random Musings From
> Under A New Hat"[1] by Tim Wickberg, SchedMD:
>
>> For slurmdbd, the critical element in the failure domain is
>> MySQL, not slurmdbd. slurmdbd itself is stateless.
>> ● slurmctld will cache accounting records (up to a limit) if
>> slurmdbd is unavailable. This can be hours+ to days+
>> depending on your system without data loss.
>
> The statelessness of slurmdbd makes me think that it can't cache any
> data.
>
> Thanks,
> Ole
>
> [1] https://slurm.schedmd.com/publications.html
>
>> On 10/31/2022 9:20 PM, Richard Chang wrote:
>>> Hi,
>>>
>>> Just for my info, I would like to know what happens when SlurmDBD
>>> loses connection to the backend Database, for ex, MariaDB.
>>>
>>> Does it cache the accounting info and keep them till the DB comes
>>> back up ?, or does it panic and shut down ?
>
More information about the slurm-users
mailing list