[slurm-users] restart slurmd on nodes w/ running jobs?

Prentice Bisbal pbisbal at pppl.gov
Mon Jul 30 12:39:26 MDT 2018


Paul and Chris,

Thanks for the information. This is the first time I had a reason to 
restart the slurmd processes (instead of just 'scontrol reconfigure) 
outside of a maintenance window, and wanted to be 100% sure before risk 
killing all the user jobs on a Friday afternoon.

I'm happy to say the operation was a success.

Prentice

On 07/27/2018 08:47 PM, Paul Edmon wrote:
>
> Restarting slurmd should be fine assuming they come back before the 
> communications time out.  I restart slurmd's all the time and haven't 
> had any real problems.
>
> -Paul Edmon-
>
>
> On 7/27/2018 6:51 PM, Chris Harwell wrote:
>> Ot is possible, but double check your config for timeouts first.
>>
>> On Fri, Jul 27, 2018, 15:31 Prentice Bisbal <pbisbal at pppl.gov 
>> <mailto:pbisbal at pppl.gov>> wrote:
>>
>>     Slurm-users,
>>
>>     I'm still learning Slurm, so I have what I think is a basic
>>     question.
>>     Can you restart slurmd on nodes where jobs are running, or will that
>>     kill the jobs? I ran into the same problem as described here:
>>
>>     https://bugs.schedmd.com/show_bug.cgi?id=3535
>>
>>     I believe the best way to fix this is to restart slurmd on all my
>>     nodes,
>>     but I've been unable to determine conclusively whether I can do
>>     that w/o
>>     killing running jobs. I've spent some time googling this, but
>>     couldn't
>>     find a definitive answer one way or the other. I prefer to not
>>     kill a
>>     bunch of user jobs on a Friday afternoon.
>>
>>     -- 
>>     Prentice
>>
>>
>> -- 
>> Chris Harwell
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180730/56932b7a/attachment.html>


More information about the slurm-users mailing list