[slurm-users] Slurm Multi-cluster implementation

navin srivastava navin.altair at gmail.com
Thu Oct 28 16:35:05 UTC 2021


Thank you Tina.
It will really help

Regards
Navin

On Thu, Oct 28, 2021, 22:01 Tina Friedrich <tina.friedrich at it.ox.ac.uk>
wrote:

> Hello,
>
> I have the database on a separate server (it runs the database and the
> database only). The login nodes run nothing SLURM related, they simply
> have the binaries installed & a SLURM config.
>
> I've never looked into having multiple databases & using
> AccountingStorageExternalHost (in fact I'd forgotten you could do that),
> so I can't comment on that (maybe someone else can); I think that works,
> yes, but as I said never tested that (didn't see much point in running
> multiple databases if one would do the job).
>
> I actually have specific login nodes for both of my clusters, to make it
> easier for users (especially those with not much experience using the
> HPC environment); so I have one login node connecting to cluster 1 and
> one connecting to cluster 1.
>
> I think the relevant bits of slurm.conf Relevant config entries (if I'm
> not mistaken) on the login nodes are probably:
>
> The differences in the slurm config files (that haven't got to do with
> topology & nodes & scheduler tuning) are
>
> ClusterName=cluster1
> ControlMachine=cluster1-slurm
> ControlAddr=/IP_OF_SLURM_CONTROLLER/
>
> ClusterName=cluster2
> ControlMachine=cluster2-slurm
> ControlAddr=/IP_OF_SLURM_CONTROLLER/
>
> (where IP_OF_SLURM_CONTROLLER is the IP address of host cluster1-slurm,
> same for cluster2)
>
> And then the have common entries for the AccountingStorageHost:
>
> AccountingStorageHost=slurm-db-prod
> AccountingStorageBackupHost=slurm-db-prod
> AccountingStoragePort=7030
> AccountingStorageType=accounting_storage/slurmdbd
>
> (slurm-db-prod is simply the hostname of the SLURM database server)
>
> Does that help?
>
> Tina
>
> On 28/10/2021 14:59, navin srivastava wrote:
> > Thank you Tina.
> >
> > so if i understood correctly.Database is global to both cluster and
> > running on login Node?
> > or is the database running on one of the master Node and shared with
> > another master server Node?
> >
> > but as far I have read that the slurm database can also be separate on
> > both the master and just use the parameter
> > AccountingStorageExternalHost so that both databases are aware of each
> > other.
> >
> > Also on the login node in slurm .conf file pointed to which Slurmctld?
> > is it possible to share the  sample slurm.conf file of login Node.
> >
> > Regards
> > Navin.
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Oct 28, 2021 at 7:06 PM Tina Friedrich
> > <tina.friedrich at it.ox.ac.uk <mailto:tina.friedrich at it.ox.ac.uk>> wrote:
> >
> >     Hi Navin,
> >
> >     well, I have two clusters & login nodes that allow access to both.
> That
> >     do? I don't think a third would make any difference in setup.
> >
> >     They need to share a database. As long as the share a database, the
> >     clusters have 'knowledge' of each other.
> >
> >     So if you set up one database server (running slurmdbd), and then a
> >     SLURM controller for each cluster (running slurmctld) using that one
> >     central database, the '-M' option should work.
> >
> >     Tina
> >
> >     On 28/10/2021 10:54, navin srivastava wrote:
> >      > Hi ,
> >      >
> >      > I am looking for a stepwise guide to setup multi cluster
> >     implementation.
> >      > We wanted to set up 3 clusters and one Login Node to run the job
> >     using
> >      > -M cluster option.
> >      > can anybody have such a setup and can share some insight into how
> it
> >      > works and it is really a stable solution.
> >      >
> >      >
> >      > Regards
> >      > Navin.
> >
> >     --
> >     Tina Friedrich, Advanced Research Computing Snr HPC Systems
> >     Administrator
> >
> >     Research Computing and Support Services
> >     IT Services, University of Oxford
> >     http://www.arc.ox.ac.uk <http://www.arc.ox.ac.uk>
> >     http://www.it.ox.ac.uk <http://www.it.ox.ac.uk>
> >
>
> --
> Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
>
> Research Computing and Support Services
> IT Services, University of Oxford
> http://www.arc.ox.ac.uk http://www.it.ox.ac.uk
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211028/96c1237f/attachment.htm>


More information about the slurm-users mailing list