[slurm-users] Upgrade from 17.02.11 to 21.08.2 and state information

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Thu Feb 3 19:55:57 UTC 2022


On 03-02-2022 16:37, Nathan Smith wrote:
> Yes, we are running slurmdbd. We could arrange enough downtime to do an 
> incremental upgrade of major versions as Brian Andrus suggested, at 
> least on the slurmctld and slurmdbd systems. The slurmds I would just do 
> a direct upgrade once the scheduler work was completed.

As Brian Andrus said, you must upgrade Slurm by at most 2 major 
versions, and that includes slurmd's as well!  Don't do a "direct 
upgrade" of slurmd by more than 2 versions!

I recommend separate physical servers for slurmdbd and slurmctld.  Then 
you can upgrade slurmdbd without taking the cluster offline.  It's OK 
for slurmdbd to be down for many hours, since slurmctld caches the state 
information in the meantime.

I've described the Slurm upgrade process in detail in my Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm

Since you start from 17.02, you have to be extremely cautious when 
upgrading the database!  See the Wiki page for details.  Make sure to 
test the database upgrade on a test server, using a database dump in 
stead of the real slurmdbd server.

I hope this helps.

/Ole

> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf 
> Of *Brian Haymore
> *Sent:* Wednesday, February 2, 2022 1:51 PM
> *To:* slurm-users at schedmd.com; Slurm User Community List 
> <slurm-users at lists.schedmd.com>
> *Subject:* [EXTERNAL] Re: [slurm-users] Upgrade from 17.02.11 to 21.08.2 
> and state information
> 
> Are you running slurmdbd in your current setup?  If you are then the 
> upgrade path there might have additional considerations moving this far 
> in versions.
> 
> --
> Brian D. Haymore
> University of Utah
> Center for High Performance Computing
> 155 South 1452 East RM 405
> Salt Lake City, Ut 84112
> Phone: 801-558-1150, Fax: 801-585-5366
> http://bit.ly/1HO1N2C 
> <https://urldefense.com/v3/__http:/bit.ly/1HO1N2C__;!!Mi0JBg!eqxyactyQJqJ7Bwy-LEQT4WeJrmjDkqZxfwNtCBk_zliQifvEt1RQj4RYjUwe98$>
> 
> ------------------------------------------------------------------------
> 
> *From:*slurm-users <slurm-users-bounces at lists.schedmd.com 
> <mailto:slurm-users-bounces at lists.schedmd.com>> on behalf of Nathan 
> Smith <sminatha at ohsu.edu <mailto:sminatha at ohsu.edu>>
> *Sent:* Wednesday, February 2, 2022 2:38 PM
> *To:* slurm-users at schedmd.com <mailto:slurm-users at schedmd.com> 
> <slurm-users at schedmd.com <mailto:slurm-users at schedmd.com>>
> *Subject:* [slurm-users] Upgrade from 17.02.11 to 21.08.2 and state 
> information
> 
> 
> The "Upgrades" section of the quick-start guide [0] warns:
> 
>  > Slurm permits upgrades to a new major release from the past two major
>  > releases, which happen every nine months (e.g. 20.02.x or 20.11.x to
>  > 21.08.x) without loss of jobs or other state information. State
>  > information from older versions will not be recognized and will be
>  > discarded, resulting in loss of all running and pending jobs.
> 
> We are planning for an upgrade from 17.02.11 to 21.08.2. As a part of
> our upgrade procedure we'd be bringing the scheduler to full stop, so
> the loss of running and pending jobs would not be a concern. Is there
> anything more to state information than running and pending jobs? For
> example, would the JobID count revert to 1 in the case of such an
> upgrade?
> 
> [0] https://slurm.schedmd.com/quickstart_admin.html#upgrade 
> <https://urldefense.com/v3/__https:/slurm.schedmd.com/quickstart_admin.html*upgrade__;Iw!!Mi0JBg!eqxyactyQJqJ7Bwy-LEQT4WeJrmjDkqZxfwNtCBk_zliQifvEt1RQj4RNExvAfw$>
> 
> -- 
> Nathan Smith
> Research Systems Engineer
> Advanced Computing Center
> Oregon Health & Science University
> 



More information about the slurm-users mailing list