[slurm-users] SlurmDBD losing connection to the backend MariaDB

Richard Chang rchang.lists at gmail.com
Wed Nov 2 01:49:42 UTC 2022


Does it mean it is best to use a single slurmdbd host in my case?

My primary slurmctld is the backup slurmdbd host, and my worry is if the 
primary slurmdbd host ( which is also the mariadb server) goes down, 
will the backup slurmdbd be able to cache data and wait till the mariadb 
catches up ?

Thanks,

RC

On 11/2/2022 2:00 AM, Brian Andrus wrote:
> Ole,
>
> Fair enough, it is actually slurmctld that does the caching. Technical 
> typo on my part there.
>
> Just trying to let the user know, there is a window that they have to 
> ensure no information is lost during a database outage.
>
> Brian Andrus
>
> On 11/1/2022 1:43 AM, Ole Holm Nielsen wrote:
>> Hi Brian,
>>
>> On 11/1/22 05:28, Brian Andrus wrote:
>>> It caches up to a point. As I understand it, that is about an hour 
>>> (depending on size and how busy the cluster is, as well as available 
>>> memory, etc).
>>
>> Have you found any documentation of slurmdbd caching?  It's 
>> well-known that slurmctld caches information while slurmdbd is down, 
>> see for example page 30 in the talk "Field Notes Mark 2: Random 
>> Musings From Under A New Hat"[1] by Tim Wickberg, SchedMD:
>>
>>> For slurmdbd, the critical element in the failure domain is
>>> MySQL, not slurmdbd. slurmdbd itself is stateless.
>>> ● slurmctld will cache accounting records (up to a limit) if
>>> slurmdbd is unavailable. This can be hours+ to days+
>>> depending on your system without data loss.
>>
>> The statelessness of slurmdbd makes me think that it can't cache any 
>> data.
>>
>> Thanks,
>> Ole
>>
>> [1] https://slurm.schedmd.com/publications.html
>>
>>> On 10/31/2022 9:20 PM, Richard Chang wrote:
>>>> Hi,
>>>>
>>>> Just for my info, I would like to know what happens when SlurmDBD 
>>>> loses connection to the backend Database, for ex, MariaDB.
>>>>
>>>> Does it cache the accounting info and keep them till the DB comes 
>>>> back up ?, or does it panic and shut down ?
>>
>



More information about the slurm-users mailing list