[slurm-users] HA for slurmdbd

Brian Andrus toomuchit at gmail.com
Tue Feb 15 18:17:40 UTC 2022


There hasn't been as much effort to make slurmdbd as resilient as you 
are hinting at because there has been no need.

The database itself can be made resilient for keeping the data safe. 
Data that is unable to go in to the database is cached until it becomes 
available, even if that is to failover to the 
AccountingStorageBackupHost. So the only potential 'loss' is access to 
immediate data that may be in a cache until a slurmdbd server is 
accessible again.

You can have multiple slurmdbd servers running and point any system to 
whichever you like. In that respect, a simple way to do it would be to 
have round-robin DNS or a load balancer in front of the slurmdbd servers 
and let that be where clients access it.

Brian Andrus

On 2/15/2022 7:46 AM, Xand Meaden wrote:
> Hello,
>
> I'm wondering what others are doing to make their slurmdbd service 
> resilient? We have the following setup right now:
>
> - two VMs running slurmctld (and also slurmdbd)
> - shared storage for StateSaveLocation using CephFS
> - three-way mysql cluster using Percona XtraDB
>
> However I can see no "Slurm native" way to make slurmdbd resilient - 
> there is no option for a backup server in slurm.conf. I naively tried 
> setting the AccountingStorageHost to "localhost" but this only worked 
> on the primary control node.
>
> Can we use something like Keepalived to present slurmdbd running on 
> both control nodes via a floating IP, or will this cause complications 
> with Slurm's use of it?
>
> Thanks for any advice,
> Xand
>
>



More information about the slurm-users mailing list