[slurm-users] Effect of slurmctld and slurmdb going down on running/pending jobs
barbara.krasovec at ijs.si
Thu Jun 24 05:27:46 UTC 2021
Just in case, increase Slurmdtimeout in slurm.conf (so that when the
controller is back, it will give you time to fix the issues with the
communication between slurmd and slurmctld - if there will be any).
Otherwise it should not affect running and pending jobs. First stop
controller, then slurmdbd. And then when the disk arrangements are done,
first start slurmdbd and then slurmctld.
On 6/24/21 12:54 AM, Amjad Syed wrote:
> Hello all
> We have a cluster running centos 7 . Our slurm scheduler is
> running on a vm machine and we are running out of disk space for /var
> The slurm innodb is taking most of space. We intend to expand the
> vdisk for slurm server. This will require a reboot for changes to
> take effect. Do we have to stop users submitting jobs by draining
> all partitions and then restart the server. That is slurmctld.slurmdb
> and mariadb? Or will the restarting of slurm vm have no effect on
> running/pending iobs?
More information about the slurm-users