[slurm-users] Slurm Multi-cluster implementation

Tina Friedrich tina.friedrich at it.ox.ac.uk
Thu Oct 28 16:28:37 UTC 2021


Hello,

I have the database on a separate server (it runs the database and the 
database only). The login nodes run nothing SLURM related, they simply 
have the binaries installed & a SLURM config.

I've never looked into having multiple databases & using 
AccountingStorageExternalHost (in fact I'd forgotten you could do that), 
so I can't comment on that (maybe someone else can); I think that works, 
yes, but as I said never tested that (didn't see much point in running 
multiple databases if one would do the job).

I actually have specific login nodes for both of my clusters, to make it 
easier for users (especially those with not much experience using the 
HPC environment); so I have one login node connecting to cluster 1 and 
one connecting to cluster 1.

I think the relevant bits of slurm.conf Relevant config entries (if I'm 
not mistaken) on the login nodes are probably:

The differences in the slurm config files (that haven't got to do with 
topology & nodes & scheduler tuning) are

ClusterName=cluster1
ControlMachine=cluster1-slurm
ControlAddr=/IP_OF_SLURM_CONTROLLER/

ClusterName=cluster2
ControlMachine=cluster2-slurm
ControlAddr=/IP_OF_SLURM_CONTROLLER/

(where IP_OF_SLURM_CONTROLLER is the IP address of host cluster1-slurm, 
same for cluster2)

And then the have common entries for the AccountingStorageHost:

AccountingStorageHost=slurm-db-prod
AccountingStorageBackupHost=slurm-db-prod
AccountingStoragePort=7030
AccountingStorageType=accounting_storage/slurmdbd

(slurm-db-prod is simply the hostname of the SLURM database server)

Does that help?

Tina

On 28/10/2021 14:59, navin srivastava wrote:
> Thank you Tina.
> 
> so if i understood correctly.Database is global to both cluster and 
> running on login Node?
> or is the database running on one of the master Node and shared with 
> another master server Node?
> 
> but as far I have read that the slurm database can also be separate on 
> both the master and just use the parameter 
> AccountingStorageExternalHost so that both databases are aware of each 
> other.
> 
> Also on the login node in slurm .conf file pointed to which Slurmctld?
> is it possible to share the  sample slurm.conf file of login Node.
> 
> Regards
> Navin.
> 
> 
> 
> 
> 
> 
> 
> 
> On Thu, Oct 28, 2021 at 7:06 PM Tina Friedrich 
> <tina.friedrich at it.ox.ac.uk <mailto:tina.friedrich at it.ox.ac.uk>> wrote:
> 
>     Hi Navin,
> 
>     well, I have two clusters & login nodes that allow access to both. That
>     do? I don't think a third would make any difference in setup.
> 
>     They need to share a database. As long as the share a database, the
>     clusters have 'knowledge' of each other.
> 
>     So if you set up one database server (running slurmdbd), and then a
>     SLURM controller for each cluster (running slurmctld) using that one
>     central database, the '-M' option should work.
> 
>     Tina
> 
>     On 28/10/2021 10:54, navin srivastava wrote:
>      > Hi ,
>      >
>      > I am looking for a stepwise guide to setup multi cluster
>     implementation.
>      > We wanted to set up 3 clusters and one Login Node to run the job
>     using
>      > -M cluster option.
>      > can anybody have such a setup and can share some insight into how it
>      > works and it is really a stable solution.
>      >
>      >
>      > Regards
>      > Navin.
> 
>     -- 
>     Tina Friedrich, Advanced Research Computing Snr HPC Systems
>     Administrator
> 
>     Research Computing and Support Services
>     IT Services, University of Oxford
>     http://www.arc.ox.ac.uk <http://www.arc.ox.ac.uk>
>     http://www.it.ox.ac.uk <http://www.it.ox.ac.uk>
> 

-- 
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator

Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk



More information about the slurm-users mailing list