[slurm-users] Effect of slurmctld and slurmdb going down on running/pending jobs

Thu Jun 24 05:27:46 UTC 2021

Just in case, increase Slurmdtimeout in slurm.conf (so that when the 
controller is back, it will give you time to fix the issues with the 
communication between slurmd and slurmctld - if there will be any). 
Otherwise it should not affect running and pending jobs. First stop 
controller, then slurmdbd. And then when the disk arrangements are done, 
first start slurmdbd and then slurmctld.

Cheers,

Barbara

On 6/24/21 12:54 AM, Amjad Syed wrote:
> Hello all
> We have  a cluster  running centos  7 . Our slurm  scheduler is 
> running on a vm  machine and  we are running out  of disk  space for /var
>  The slurm innodb is taking most of space.  We intend to expand the 
> vdisk for slurm server. This will require a reboot  for changes to 
> take  effect.  Do we have to stop users  submitting  jobs by draining 
> all partitions and then restart the server. That is slurmctld.slurmdb 
> and mariadb? Or  will the restarting of slurm vm have  no effect on 
> running/pending iobs?
>
> Sincerely
>
> Amjad