[slurm-users] Setup for backup slurmctld
Chris Samuel
chris at csamuel.org
Sun Mar 1 00:58:41 UTC 2020
On Wednesday, 26 February 2020 12:48:26 PM PST Joshua Baker-LePain wrote:
> We're planning the migration of our moderately sized cluster (~400 nodes,
> 40K jobs/day) from SGE to slurm. We'd very much like to have a backup
> slurmctld, and it'd be even better if our backup slurmctld could be in a
> separate data center from the primary (though they'd still be on the same
> private network). So, how are folks sharing the StateSaveLocation in such
> a setup? Any and all recommendations (including those with the 2
> slurmctld servers in the same rack) welcome. Thanks!
We use GPFS for our shared state directory (Cori is 12K nodes and we put
5K-30K jobs a day through it, very variable job mix); the important thing is
the IOPS rate for the filesystem, if it can't keep up with Slurm then you're
going to see performance issues.
Tim from SchedMD had some notes on HA (and other things) from the Slurm 2017
user group): https://slurm.schedmd.com/SLUG17/FieldNotes.pdf
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users
mailing list