[slurm-users] Migration of slurm communication network / Steps / how to

Purvesh Parmar purveshp0507 at gmail.com
Mon Apr 24 06:56:36 UTC 2023


Thank you.. will try this and get back. Any other step being missed here
for migration?


Thankyou,


Purvesh

On Mon, 24 Apr 2023 at 12:08, Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>
wrote:

> On 4/24/23 08:09, Purvesh Parmar wrote:
> > thank you, however, because this is change in the data center, the names
> > of the servers contain datacenter names as well in its hostname and in
> > fqdn as well, hence i have to change both, hostnames as well as ip
> > addresses, compulsorily, to given hostnames as per new DC names.
>
> Could your data center be persuaded to introduce DNS CNAME aliases for the
> old names to point to the new DC names?
>
> If you're forced to use new DNS names only, then it's simple to change DNS
> names of compute nodes and partitions in slurm.conf:
>
> NodeName=...
> PartitionName=xxx Nodes=...
>
> as well as the slurmdb server name:
>
> AccountingStorageHost=...
>
> What I have never tried before is to change the DNS name of the slurmctld
> host:
>
> ControlMachine=...
>
> The critical aspect here is that you need to stop all batch jobs, plus
> slurmdbd and slurmctld.  Then you can backup (tar-ball) and transfer the
> Slurm state directories:
>
> StateSaveLocation=/var/spool/slurmctld
>
> However, I don't know if the name of the ControlMachine is hard-coded in
> the StateSaveLocation files?
>
> I strongly suggest that you try to make a test migration of the cluster to
> the new DC to find out if it works or not.  Then you can always make
> multiple attempts without breaking anything.
>
> Best regards,
> Ole
>
>
> > On Mon, 24 Apr 2023 at 11:25, Ole Holm Nielsen <
> Ole.H.Nielsen at fysik.dtu.dk
> > <mailto:Ole.H.Nielsen at fysik.dtu.dk>> wrote:
> >
> >     On 4/24/23 06:58, Purvesh Parmar wrote:
> >      > thank you, but its change of hostnames as well, apart from ip
> >     addresses
> >      > as well of the slurm server, database serverver name and slurmd
> >     compute
> >      > nodes as well.
> >
> >     I suggest that you talk to your networking people and request that
> the
> >     old
> >     DNS names be created in the new network's DNS for your Slurm cluster.
> >     Then Ryan's solution will work.  Changing DNS names is a very simple
> >     matter!
> >
> >     My 2 cents,
> >     Ole
> >
> >
> >      > On Mon, 24 Apr 2023 at 10:04, Ryan Novosielski
> >     <novosirj at rutgers.edu <mailto:novosirj at rutgers.edu>
> >      > <mailto:novosirj at rutgers.edu <mailto:novosirj at rutgers.edu>>>
> wrote:
> >      >
> >      >     I think it’s easier than all of this. Are you actually
> changing
> >     names
> >      >     of all of these things, or just IP addresses? It they all
> >     resolve to
> >      >     an IP now and you can bring everything down and change the
> >     hosts files
> >      >     or DNS, it seems to me that if the names aren’t changing,
> >     that’s that.
> >      >     I know that “scontrol show cluster” will show the wrong IP
> >     address but
> >      >     I think that updates itself.
> >      >
> >      >     The names of the servers are in slurm.conf, but again, if the
> names
> >      >     don’t change, that won’t matter. If you have IPs there, you
> >     will need
> >      >     to change them.
> >      >
> >      >     Sent from my iPhone
> >      >
> >      >      > On Apr 23, 2023, at 14:01, Purvesh Parmar
> >     <purveshp0507 at gmail.com <mailto:purveshp0507 at gmail.com>
> >      >     <mailto:purveshp0507 at gmail.com
> >     <mailto:purveshp0507 at gmail.com>>> wrote:
> >      >      > 
> >      >      > Hello,
> >      >      >
> >      >      > We have slurm 21.08 on ubuntu 20. We have a cluster of 8
> nodes.
> >      >     Entire slurm communication happens over 192.168.5.x network
> (LAN).
> >      >     However as per requirement, now we are migrating the cluster
> to
> >     other
> >      >     premises and there we have 172.16.1.x (LAN). I have to
> migrate the
> >      >     entire network including SLURMDBD (mariadb), SLURMCTLD,
> SLURMD.
> >     ALso
> >      >     the cluster network is also changing from 192.168.5.x to
> 172.16.1.x
> >      >     and each node will be assigned the ip address from the
> 172.16.1.x
> >      >     network.
> >      >      > The cluster has been running for the last 3 months and it
> is
> >      >     required to maintain the old usage stats as well.
> >      >      >
> >      >      >
> >      >      >  Is the procedure correct as below :
> >      >      >
> >      >      > 1) Stop slurm
> >      >      > 2) suspend all the queued jobs
> >      >      > 3) backup slurm database
> >      >      > 4) change the slurm & munge configuration i.e. munge conf,
> >     mariadb
> >      >     conf, slurmdbd.conf, slurmctld.conf, slurmd.conf (on compute
> >     nodes),
> >      >     gres.conf, service file
> >      >      > 5) Later, do the update in the slurm database by executing
> below
> >      >     command
> >      >      > sacctmgr modify node where node=old_name set name=new_name
> >      >      > for all the nodes.
> >      >      > ALso, I think, slurm server name and slurmdbd server names
> >     are also
> >      >     required to be updated. How to do it, still checking
> >      >      > 6) Finally, start slurmdbd, slurmctld on server and slurmd
> on
> >      >     compute nodes
> >      >      >
> >      >      > Please help and guide for above.
> >      >      >
> >      >      > Regards,
> >      >      >
> >      >      > Purvesh Parmar
> >      >      > INHAIT
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230424/ac2bae0d/attachment-0001.htm>


More information about the slurm-users mailing list