[slurm-users] Effect of slurmctld and slurmdb going down on running/pending jobs
Josef Dvoracek
jose at fzu.cz
Thu Jun 24 09:43:58 UTC 2021
> I thought setting partitions to DOWN will kill jobs?
nn, it just avoids starting new jobs from the job queue in given partition.
josef
On 24. 06. 21 11:26, Tina Friedrich wrote:
> I thought setting partitions to DOWN will kill jobs?
>
> Amjad - to my experience, the slurmdbd & slurmctld server can be
> rebooted with no effect on running jobs. You can't submit whilst it's
> down, and I'm not precisely sure what happens to jobs that are just
> finishing - but really the impact should be minimal.
>
> (I've done exactly what you're needing to do - reboot so a change in
> disk size is picked up - at least once with the cluster running.)
>
> It is absolutely safe to restart slurmctld (and slurmdbd) with jobs
> running on the cluster, that really is something that at least I do
> all the time.
>
> Tina
>
> On 24/06/2021 10:16, Josef Dvoracek wrote:
>> hi,
>>
>> just set the partitions to "DOWN" to avoid unexpected behavior for
>> users and reboot slurm(ctl|dbd)+sql box. Running jobs are from my
>> experience not affected.
>> No need to drain nodes.
>>
>> josef
>>
>> On 24. 06. 21 0:54, Amjad Syed wrote:
>>> Hello all
>>> We have a cluster running centos 7 . Our slurm scheduler is
>>> running on a vm machine and we are running out of disk space for
>>> /var
>>> The slurm innodb is taking most of space. We intend to expand the
>>> vdisk for slurm server. This will require a reboot for changes to
>>> take effect. Do we have to stop users submitting jobs by draining
>>> all partitions and then restart the server. That is
>>> slurmctld.slurmdb and mariadb? Or will the restarting of slurm vm
>>> have no effect on running/pending iobs?
>>>
>>> Sincerely
>>>
>>> Amjad
>>
>
--
Josef Dvoracek
Institute of Physics | Czech Academy of Sciences
cell: +420 608 563 558 | https://telegram.me/jose_d | FZU phone nr. : 2669
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4265 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210624/745c81e7/attachment.bin>
More information about the slurm-users
mailing list