[slurm-users] restart slurmd on nodes w/ running jobs?
Prentice Bisbal
pbisbal at pppl.gov
Mon Jul 30 12:39:26 MDT 2018
Paul and Chris,
Thanks for the information. This is the first time I had a reason to
restart the slurmd processes (instead of just 'scontrol reconfigure)
outside of a maintenance window, and wanted to be 100% sure before risk
killing all the user jobs on a Friday afternoon.
I'm happy to say the operation was a success.
Prentice
On 07/27/2018 08:47 PM, Paul Edmon wrote:
>
> Restarting slurmd should be fine assuming they come back before the
> communications time out. I restart slurmd's all the time and haven't
> had any real problems.
>
> -Paul Edmon-
>
>
> On 7/27/2018 6:51 PM, Chris Harwell wrote:
>> Ot is possible, but double check your config for timeouts first.
>>
>> On Fri, Jul 27, 2018, 15:31 Prentice Bisbal <pbisbal at pppl.gov
>> <mailto:pbisbal at pppl.gov>> wrote:
>>
>> Slurm-users,
>>
>> I'm still learning Slurm, so I have what I think is a basic
>> question.
>> Can you restart slurmd on nodes where jobs are running, or will that
>> kill the jobs? I ran into the same problem as described here:
>>
>> https://bugs.schedmd.com/show_bug.cgi?id=3535
>>
>> I believe the best way to fix this is to restart slurmd on all my
>> nodes,
>> but I've been unable to determine conclusively whether I can do
>> that w/o
>> killing running jobs. I've spent some time googling this, but
>> couldn't
>> find a definitive answer one way or the other. I prefer to not
>> kill a
>> bunch of user jobs on a Friday afternoon.
>>
>> --
>> Prentice
>>
>>
>> --
>> Chris Harwell
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180730/56932b7a/attachment.html>
More information about the slurm-users
mailing list