[slurm-users] Slurmdbd High Availability
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Thu Apr 13 11:16:34 UTC 2023
On 4/13/23 11:49, Shaghuf Rahman wrote:
> I am setting up Slurmdb in my system and I need some inputs
>
> My current setup is like
> server1 : 192.168.123.12(slurmctld)
> server2: 192.168.123.13(Slurmctld)
> server3: 192.168.123.14(Slurmdbd) which is pointing to both Server1 and
> Server2.
> database: MySQL
>
> I have 1 more server named as server 4: 192.168.123.15 which I need to
> make it as a secondary database server. I want to configure this server4
> which will sync the database and make it either Active-Active slurmdbd or
> Active-Passive.
>
> Could anyone please help me with the *steps* how to configure and also how
> am i going to *sync* my *database* on both the servers simultaneously.
Slurm administrators have different opinions about the usefulness versus
complexity of HA setups. You could read SchedMD's presentation from page
38 and onwards: https://slurm.schedmd.com/SLUG19/Field_Notes_3.pdf
Some noteworthy slides state:
> Separating slurmctld and slurmdbd in normal production use
> is recommended.
> Master/backup slurmctld is common, and - as long as the
> performance for StateSaveLocation is kept high - not that
> difficult to implement.
> For slurmdbd, the critical element in the failure domain is
> MySQL, not slurmdbd. slurmdbd itself is stateless.
> IMNSHO, the additional complexity of a redundant MySQL
> deployment is more likely to cause an outage than it is to
> prevent one.
> So don’t bother setting up a redundant slurmdbd, keep
> slurmdbd + MySQL local to a single server.
I hope this helps.
/Ole
More information about the slurm-users
mailing list