[slurm-users] Effect of slurmctld and slurmdb going down on running/pending jobs
Tina Friedrich
tina.friedrich at it.ox.ac.uk
Thu Jun 24 09:26:24 UTC 2021
I thought setting partitions to DOWN will kill jobs?
Amjad - to my experience, the slurmdbd & slurmctld server can be
rebooted with no effect on running jobs. You can't submit whilst it's
down, and I'm not precisely sure what happens to jobs that are just
finishing - but really the impact should be minimal.
(I've done exactly what you're needing to do - reboot so a change in
disk size is picked up - at least once with the cluster running.)
It is absolutely safe to restart slurmctld (and slurmdbd) with jobs
running on the cluster, that really is something that at least I do all
the time.
Tina
On 24/06/2021 10:16, Josef Dvoracek wrote:
> hi,
>
> just set the partitions to "DOWN" to avoid unexpected behavior for users
> and reboot slurm(ctl|dbd)+sql box. Running jobs are from my experience
> not affected.
> No need to drain nodes.
>
> josef
>
> On 24. 06. 21 0:54, Amjad Syed wrote:
>> Hello all
>> We have a cluster running centos 7 . Our slurm scheduler is
>> running on a vm machine and we are running out of disk space for /var
>> The slurm innodb is taking most of space. We intend to expand the
>> vdisk for slurm server. This will require a reboot for changes to
>> take effect. Do we have to stop users submitting jobs by draining
>> all partitions and then restart the server. That is slurmctld.slurmdb
>> and mariadb? Or will the restarting of slurm vm have no effect on
>> running/pending iobs?
>>
>> Sincerely
>>
>> Amjad
>
--
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk
More information about the slurm-users
mailing list