[slurm-users] SlurmDBD losing connection to the backend MariaDB

Brian Andrus toomuchit at gmail.com
Tue Nov 1 20:30:54 UTC 2022


Ole,

Fair enough, it is actually slurmctld that does the caching. Technical 
typo on my part there.

Just trying to let the user know, there is a window that they have to 
ensure no information is lost during a database outage.

Brian Andrus

On 11/1/2022 1:43 AM, Ole Holm Nielsen wrote:
> Hi Brian,
>
> On 11/1/22 05:28, Brian Andrus wrote:
>> It caches up to a point. As I understand it, that is about an hour 
>> (depending on size and how busy the cluster is, as well as available 
>> memory, etc).
>
> Have you found any documentation of slurmdbd caching?  It's well-known 
> that slurmctld caches information while slurmdbd is down, see for 
> example page 30 in the talk "Field Notes Mark 2: Random Musings From 
> Under A New Hat"[1] by Tim Wickberg, SchedMD:
>
>> For slurmdbd, the critical element in the failure domain is
>> MySQL, not slurmdbd. slurmdbd itself is stateless.
>> ● slurmctld will cache accounting records (up to a limit) if
>> slurmdbd is unavailable. This can be hours+ to days+
>> depending on your system without data loss.
>
> The statelessness of slurmdbd makes me think that it can't cache any 
> data.
>
> Thanks,
> Ole
>
> [1] https://slurm.schedmd.com/publications.html
>
>> On 10/31/2022 9:20 PM, Richard Chang wrote:
>>> Hi,
>>>
>>> Just for my info, I would like to know what happens when SlurmDBD 
>>> loses connection to the backend Database, for ex, MariaDB.
>>>
>>> Does it cache the accounting info and keep them till the DB comes 
>>> back up ?, or does it panic and shut down ?
>



More information about the slurm-users mailing list