[slurm-users] Slurm Multi-cluster implementation
tina.friedrich at it.ox.ac.uk
Thu Oct 28 16:28:37 UTC 2021
I have the database on a separate server (it runs the database and the
database only). The login nodes run nothing SLURM related, they simply
have the binaries installed & a SLURM config.
I've never looked into having multiple databases & using
AccountingStorageExternalHost (in fact I'd forgotten you could do that),
so I can't comment on that (maybe someone else can); I think that works,
yes, but as I said never tested that (didn't see much point in running
multiple databases if one would do the job).
I actually have specific login nodes for both of my clusters, to make it
easier for users (especially those with not much experience using the
HPC environment); so I have one login node connecting to cluster 1 and
one connecting to cluster 1.
I think the relevant bits of slurm.conf Relevant config entries (if I'm
not mistaken) on the login nodes are probably:
The differences in the slurm config files (that haven't got to do with
topology & nodes & scheduler tuning) are
(where IP_OF_SLURM_CONTROLLER is the IP address of host cluster1-slurm,
same for cluster2)
And then the have common entries for the AccountingStorageHost:
(slurm-db-prod is simply the hostname of the SLURM database server)
Does that help?
On 28/10/2021 14:59, navin srivastava wrote:
> Thank you Tina.
> so if i understood correctly.Database is global to both cluster and
> running on login Node?
> or is the database running on one of the master Node and shared with
> another master server Node?
> but as far I have read that the slurm database can also be separate on
> both the master and just use the parameter
> AccountingStorageExternalHost so that both databases are aware of each
> Also on the login node in slurm .conf file pointed to which Slurmctld?
> is it possible to share the sample slurm.conf file of login Node.
> On Thu, Oct 28, 2021 at 7:06 PM Tina Friedrich
> <tina.friedrich at it.ox.ac.uk <mailto:tina.friedrich at it.ox.ac.uk>> wrote:
> Hi Navin,
> well, I have two clusters & login nodes that allow access to both. That
> do? I don't think a third would make any difference in setup.
> They need to share a database. As long as the share a database, the
> clusters have 'knowledge' of each other.
> So if you set up one database server (running slurmdbd), and then a
> SLURM controller for each cluster (running slurmctld) using that one
> central database, the '-M' option should work.
> On 28/10/2021 10:54, navin srivastava wrote:
> > Hi ,
> > I am looking for a stepwise guide to setup multi cluster
> > We wanted to set up 3 clusters and one Login Node to run the job
> > -M cluster option.
> > can anybody have such a setup and can share some insight into how it
> > works and it is really a stable solution.
> > Regards
> > Navin.
> Tina Friedrich, Advanced Research Computing Snr HPC Systems
> Research Computing and Support Services
> IT Services, University of Oxford
> http://www.arc.ox.ac.uk <http://www.arc.ox.ac.uk>
> http://www.it.ox.ac.uk <http://www.it.ox.ac.uk>
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
Research Computing and Support Services
IT Services, University of Oxford
More information about the slurm-users