[slurm-users] [External] Re: Moving Slurmctld and slurmdbd to a new host

Prentice Bisbal pbisbal at pppl.gov
Tue Jan 19 19:21:04 UTC 2021


Thanks to both of you for your replies. I did the move this morning, and 
it went off without a hitch. It does appear that the job state directory 
keeps track of the queue data, because as soon as I copied those dirs 
over, I was able to see the queue information on the new Slurm controller.

I had done this operation once before, but it was a couple years ago, so 
I just wanted to be safe rather than sorry. Thanks for the help.

Prentice

On 1/16/21 1:43 PM, Michael Gutteridge wrote:
> I'd confirm that as well.  The state directory has all of that 
> information.  We just upgraded from 18.05 to 20.02 on a different host 
> and while the cluster was quiet (we had a maintenance reservation in 
> place) there were running jobs which survived the upgrade.
>
> I think the big thing to watch out for is setting the slurmdtimeout in 
> your config prior to the update.  Might not be necessary depending on 
> the exact steps you're using, but it's useful insurance against job loss.
>
> HTH
>
>  - Michael
>
>
> On Fri, Jan 15, 2021 at 7:51 PM Ryan Novosielski <novosirj at rutgers.edu 
> <mailto:novosirj at rutgers.edu>> wrote:
>
>     My understanding is job state directory. Theoretically if you back
>     it up, screw up and lose it, you can restore it and try again.
>     There’s some mention of this in the upgrade docs if I’m not
>     mistaken (as they suggest backing it up in case you mess up during).
>
>     -- 
>     #BlackLivesMatter
>     ____
>     || \\UTGERS,
>     |---------------------------*O*---------------------------
>     ||_// the State     |         Ryan Novosielski -
>     novosirj at rutgers.edu <mailto:novosirj at rutgers.edu>
>     || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
>     RBHS Campus
>     ||  \\    of NJ     | Office of Advanced Research Computing - MSB
>     C630, Newark
>         `'
>
>>     On Jan 15, 2021, at 13:44, Prentice Bisbal <pbisbal at pppl.gov
>>     <mailto:pbisbal at pppl.gov>> wrote:
>>
>>     Slurm users,
>>
>>     I'm planning on moving slurmctld and slurmdbd to a new host. I
>>     know how to dump the MySQL DB from the old server and import it
>>     to the new slurmdbd host, and I know how to copy the job state
>>     directories to the new host. I plan on doing this during our next
>>     maintenance window when there are no jobs running on the cluster.
>>
>>     However, there will be plenty of jobs in the queue, so my
>>     question is this: What will happen to jobs in the queue when I do
>>     this? Is the queue information stored in the database or the job
>>     state directories, or a third location? How can I make sure I
>>     don't lose the state of the queue?
>>
>>     -- 
>>     Prentice
>>
>>
-- 
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210119/09c4f65e/attachment-0001.htm>


More information about the slurm-users mailing list