Hi Daniel,

We run a simple Galera-MySQL Cluster and have a HAproxy running on all clients to steer the requests (round-Robin) to one of the DB-nodes that answer the health check properly.

Best,
Andreas 

Am 23.01.2024 um 15:35 schrieb Daniel L'Hommedieu <dlhommedieu@gmail.com>:

 Xand,

Thanks - that’s great to hear.  I was thinking of using Anycast to achieve the same thing, but good to know that keepalived is a viable solution as well.

Best,
Daniel

On Jan 23, 2024, at 09:29, Xand Meaden <xand.meaden@kcl.ac.uk> wrote:

Hi,

We are using Percona XtraDB cluster to achieve HA for our Slurm databases. There is a single virtual IP that will be kept on one of the cluster's servers using keepalived.

Regards,
Xand

From: slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Daniel L'Hommedieu <dlhommedieu@gmail.com>
Sent: 22 January 2024 17:23
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Database cluster
 
[You don't often get email from dlhommedieu@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

Community:

What do you do to ensure database reliability in your SLURM environment?  We can have multiple controllers and multiple slurmdbds, but my understanding is that slurmdbd can be configured with a single MySQL server, so what do you do?  Do you have that “single MySQL server” be a cluster, such as Percona XtraDB?  Do you use MySQL replication, then manually switch to slurmdbd to a replication slave if the master goes down?  Do you do something else?

Thanks.

Daniel