[slurm-users] Setup for backup slurmctld

Chris Samuel chris at csamuel.org
Sun Mar 1 00:58:41 UTC 2020

On Wednesday, 26 February 2020 12:48:26 PM PST Joshua Baker-LePain wrote:

> We're planning the migration of our moderately sized cluster (~400 nodes,
> 40K jobs/day) from SGE to slurm.  We'd very much like to have a backup
> slurmctld, and it'd be even better if our backup slurmctld could be in a
> separate data center from the primary (though they'd still be on the same
> private network).  So, how are folks sharing the StateSaveLocation in such
> a setup?  Any and all recommendations (including those with the 2
> slurmctld servers in the same rack) welcome.  Thanks!

We use GPFS for our shared state directory (Cori is 12K nodes and we put 
5K-30K jobs a day through it, very variable job mix); the important thing is 
the IOPS rate for the filesystem, if it can't keep up with Slurm then you're 
going to see performance issues.

Tim from SchedMD had some notes on HA (and other things) from the Slurm 2017 
user group):  https://slurm.schedmd.com/SLUG17/FieldNotes.pdf

All the best,
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

More information about the slurm-users mailing list