[slurm-users] Migrate the slurmdbd service to another server

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Mon Mar 4 18:12:10 UTC 2019

On 04-03-2019 16:30, Loris Bennett wrote:
>> On 3/4/19 2:26 PM, Loris Bennett wrote:
>>> Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> writes:
>>>> We're one of the many Slurm sites which run the slurmdbd database daemon on the
>>>> same server as the slurmctld daemon.  This works without problems at our site
>>>> given our modest load, however, SchedMD recommends to run the daemons on
>>>> separate servers.
>>>> Contemplating how to upgrade our cluster from Slurm 17.11 to 18.08, I've come to
>>>> appreciate the advantage of running the daemons on separate servers: One can
>>>> upgrade slurmdbd to 18.08 while keeping slurmctld at 17.11 (for a while at
>>>> least).  This enables us to upgrade to 18.08 in the recommended order without
>>>> any interruption to our running jobs and without any cluster downtime.
>>> Can't one do this even with only one server?  We have always run both
>>> slurmctld and slurmdbd on one machine and have performed all the updates
>>> without any downtime.
>> For minor upgrade 17.11.x to 17.11.y there is no issue because the MySQL
>> database layout is unchanged.
>> Major upgrades such as 17.11 to 18.08 is potentially more risky, see for example
>> this list thread "Extreme long db upgrade 16.05.6 -> 17.11.3":
>> https://lists.schedmd.com/pipermail/slurm-users/2018-February/000612.html
>> I recommend to study the instructions in
>> https://slurm.schedmd.com/quickstart_admin.html#upgrade.
> That is indeed the protocol we follow.
>> See also the slides on "Upgrading" in
>> https://slurm.schedmd.com/SLUG18/field_notes2.pdf from the SLUG meeting 2018
>> (https://slurm.schedmd.com/publications.html).
>> Updating the database layout during a Slurm major upgrade can in special
>> situations lead to problems, so it's safer to do the upgrade separately for
>> slurmdbd and slurmctld.  This is why I've decided to move my slurmdbd and
>> database to a separate server now.  The slurmctld which governs the entire
>> cluster is thereby unaffected as I "play" with the database upgrade, and I can
>> upgrade Slurm without any cluster downtime.
> I don't understand how the separation of the two services onto two
> machines in the production environment makes such a difference.  No
> matter where the slurmdbd is running, the slurmcltd will attempt to
> contact it and cache data if the slurmdbd is unreachable.  Or is the
> point more that, with a second machine you can do an offline conversion
> of the database, i.e. it is good to have a test and a production
> environment?

This is a nice discussion!  My reasoning is:

If slurmdbd and slurmctld both run on the same machine, you MUST upgrade 
the RPMs simultaneously, for example, 17.11.13 to 18.08.5.  When 
slurmdbd runs on a separate machine, you can upgrade that one without 
affecting slurmctld.

Mind you, SchedMD's recommended incremental sequence of upgrading is 
these enumerated steps:

1. slurmdbd
2. slurmctld
3. slurmd (on nodes)
4. Slurm commands (on login hosts)

There is a risk involved in lumping steps 1+2 together into one step, 
especially if the database upgrade somehow has a problem or takes a very 
long time.  What if you're forced to roll back and downgrade slurmdbd to 
the old version: In this case problems may arise by downgrading 
slurmctld at the same time.

A crucial part of slurmctld is the StateSaveLocation 
(/var/spool/slurmctld) directory which is being updated all the time due 
to cluster activity.  You don't want to compromise the operation of 
slurmctld while upgrading slurmdbd.

I certainly recommend testing and timing the database and slurmdbd 
upgrade on a non-production node before the real upgrade.

> On the other hand, the Quick Start Addmin Guide
> (https://slurm.schedmd.com/quickstart_admin.html) does mention "head
> node, compute nodes, and slurmdbd node".  I had always assumed a
> separate slurmdbd node was mainly useful for performance reasons at
> sites will a huge throughput of jobs, but maybe I am missing something.

For me safety of upgrading is most important.  You're right that 
high-throughput will want to separate the dbd and ctld services for 
performance reasons.


