[slurm-users] HA for slurmdbd

Xand Meaden xand.meaden at kcl.ac.uk
Tue Feb 15 15:46:55 UTC 2022


I'm wondering what others are doing to make their slurmdbd service 
resilient? We have the following setup right now:

- two VMs running slurmctld (and also slurmdbd)
- shared storage for StateSaveLocation using CephFS
- three-way mysql cluster using Percona XtraDB

However I can see no "Slurm native" way to make slurmdbd resilient - 
there is no option for a backup server in slurm.conf. I naively tried 
setting the AccountingStorageHost to "localhost" but this only worked on 
the primary control node.

Can we use something like Keepalived to present slurmdbd running on both 
control nodes via a floating IP, or will this cause complications with 
Slurm's use of it?

Thanks for any advice,

More information about the slurm-users mailing list