[slurm-users] Slurmdbd High Availability

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Thu Apr 13 11:16:34 UTC 2023


On 4/13/23 11:49, Shaghuf Rahman wrote:
> I am setting up Slurmdb in my system and I need some inputs
> 
> My current setup is like
> server1 : 192.168.123.12(slurmctld)
> server2: 192.168.123.13(Slurmctld)
> server3: 192.168.123.14(Slurmdbd) which is pointing to both Server1 and 
> Server2.
> database: MySQL
> 
> I have 1 more server named as server 4: 192.168.123.15 which I need to 
> make it as a secondary database server. I want to configure this server4 
> which will sync the database and make it either Active-Active slurmdbd or 
> Active-Passive.
> 
> Could anyone please help me with the *steps* how to configure and also how 
> am i going to *sync* my *database* on both the servers simultaneously.

Slurm administrators have different opinions about the usefulness versus 
complexity of HA setups.  You could read SchedMD's presentation from page 
38 and onwards: https://slurm.schedmd.com/SLUG19/Field_Notes_3.pdf

Some noteworthy slides state:

> Separating slurmctld and slurmdbd in normal production use
> is recommended.
> Master/backup slurmctld is common, and - as long as the
> performance for StateSaveLocation is kept high - not that
> difficult to implement.

> For slurmdbd, the critical element in the failure domain is
> MySQL, not slurmdbd. slurmdbd itself is stateless.

> IMNSHO, the additional complexity of a redundant MySQL
> deployment is more likely to cause an outage than it is to
> prevent one.
> So don’t bother setting up a redundant slurmdbd, keep
> slurmdbd + MySQL local to a single server.

I hope this helps.

/Ole



More information about the slurm-users mailing list