[slurm-users] [External] Re: Moving Slurmctld and slurmdbd to a new host
Prentice Bisbal
pbisbal at pppl.gov
Tue Jan 19 19:21:04 UTC 2021
Thanks to both of you for your replies. I did the move this morning, and
it went off without a hitch. It does appear that the job state directory
keeps track of the queue data, because as soon as I copied those dirs
over, I was able to see the queue information on the new Slurm controller.
I had done this operation once before, but it was a couple years ago, so
I just wanted to be safe rather than sorry. Thanks for the help.
Prentice
On 1/16/21 1:43 PM, Michael Gutteridge wrote:
> I'd confirm that as well. The state directory has all of that
> information. We just upgraded from 18.05 to 20.02 on a different host
> and while the cluster was quiet (we had a maintenance reservation in
> place) there were running jobs which survived the upgrade.
>
> I think the big thing to watch out for is setting the slurmdtimeout in
> your config prior to the update. Might not be necessary depending on
> the exact steps you're using, but it's useful insurance against job loss.
>
> HTH
>
> - Michael
>
>
> On Fri, Jan 15, 2021 at 7:51 PM Ryan Novosielski <novosirj at rutgers.edu
> <mailto:novosirj at rutgers.edu>> wrote:
>
> My understanding is job state directory. Theoretically if you back
> it up, screw up and lose it, you can restore it and try again.
> There’s some mention of this in the upgrade docs if I’m not
> mistaken (as they suggest backing it up in case you mess up during).
>
> --
> #BlackLivesMatter
> ____
> || \\UTGERS,
> |---------------------------*O*---------------------------
> ||_// the State | Ryan Novosielski -
> novosirj at rutgers.edu <mailto:novosirj at rutgers.edu>
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
> RBHS Campus
> || \\ of NJ | Office of Advanced Research Computing - MSB
> C630, Newark
> `'
>
>> On Jan 15, 2021, at 13:44, Prentice Bisbal <pbisbal at pppl.gov
>> <mailto:pbisbal at pppl.gov>> wrote:
>>
>> Slurm users,
>>
>> I'm planning on moving slurmctld and slurmdbd to a new host. I
>> know how to dump the MySQL DB from the old server and import it
>> to the new slurmdbd host, and I know how to copy the job state
>> directories to the new host. I plan on doing this during our next
>> maintenance window when there are no jobs running on the cluster.
>>
>> However, there will be plenty of jobs in the queue, so my
>> question is this: What will happen to jobs in the queue when I do
>> this? Is the queue information stored in the database or the job
>> state directories, or a third location? How can I make sure I
>> don't lose the state of the queue?
>>
>> --
>> Prentice
>>
>>
--
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210119/09c4f65e/attachment-0001.htm>
More information about the slurm-users
mailing list