[slurm-users] Effect of slurmctld and slurmdb going down on running/pending jobs

Josef Dvoracek jose at fzu.cz
Thu Jun 24 09:43:58 UTC 2021


 > I thought setting partitions to DOWN will kill jobs?
nn, it just avoids starting new jobs from the job queue in given partition.

josef

On 24. 06. 21 11:26, Tina Friedrich wrote:
> I thought setting partitions to DOWN will kill jobs?
>
> Amjad - to my experience, the slurmdbd & slurmctld server can be 
> rebooted with no effect on running jobs. You can't submit whilst it's 
> down, and I'm not precisely sure what happens to jobs that are just 
> finishing - but really the impact should be minimal.
>
> (I've done exactly what you're needing to do - reboot so a change in 
> disk size is picked up - at least once with the cluster running.)
>
> It is absolutely safe to restart slurmctld (and slurmdbd) with jobs 
> running on the cluster, that really is something that at least I do 
> all the time.
>
> Tina
>
> On 24/06/2021 10:16, Josef Dvoracek wrote:
>> hi,
>>
>> just set the partitions to "DOWN" to avoid unexpected behavior for 
>> users and reboot slurm(ctl|dbd)+sql box. Running jobs are from my 
>> experience not affected.
>> No need to drain nodes.
>>
>> josef
>>
>> On 24. 06. 21 0:54, Amjad Syed wrote:
>>> Hello all
>>> We have  a cluster  running centos  7 . Our slurm  scheduler is 
>>> running on a vm  machine and  we are running out  of disk space for 
>>> /var
>>>  The slurm innodb is taking most of space.  We intend to expand the 
>>> vdisk for slurm server. This will require a reboot for changes to 
>>> take  effect.  Do we have to stop users submitting  jobs by draining 
>>> all partitions and then restart the server. That is 
>>> slurmctld.slurmdb and mariadb? Or  will the restarting of slurm vm 
>>> have  no effect on running/pending iobs?
>>>
>>> Sincerely
>>>
>>> Amjad
>>
>
-- 
Josef Dvoracek
Institute of Physics | Czech Academy of Sciences
cell: +420 608 563 558 | https://telegram.me/jose_d | FZU phone nr. : 2669


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4265 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210624/745c81e7/attachment.bin>


More information about the slurm-users mailing list