Hi Diego.
In our setup, the database is critical. We have some wrapper scripts that consult the database for information, and we also set environment variables on login, based on user/partition associations. If the database is down, none of those things work.
I doubt there is appetite in the organization to change the way our setup works, but if we can improve database reliability, that would be a good solution. Mostly I am interested in protecting from hardware failure, and that’s why I’m interested in a cluster solution such as XtraDB.
Thanks.
Daniel
On Jan 23, 2024, at 03:23, Diego Zuccato diego.zuccato@unibo.it wrote:
IIUC the database is not "critical": if it goes down, you lose access to some statistics. But job data gets cached anyway and the db will be updated when it comes back online.
Diego
Il 22/01/2024 18:23, Daniel L'Hommedieu ha scritto:
Community: What do you do to ensure database reliability in your SLURM environment? We can have multiple controllers and multiple slurmdbds, but my understanding is that slurmdbd can be configured with a single MySQL server, so what do you do? Do you have that “single MySQL server” be a cluster, such as Percona XtraDB? Do you use MySQL replication, then manually switch to slurmdbd to a replication slave if the master goes down? Do you do something else? Thanks. Daniel
-- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786