We set
SlurmdTimeout=600
. The docs say not to go any higher than 65533 seconds:
The FAQ has info about SlurmdTimeout also. The worst thing that could happen is will take longer to set nodes as being down:
>A node is set DOWN when the slurmd daemon on it stops responding for SlurmdTimeout as defined in slurm.conf.
I wouldn't set it too high, but too high vs too low will vary from site to site and how busy your controllers are and how busy your network
is.
Regards
--Mick
We've been running one cluster with SlurmdTimeout = 1200 sec for a
couple of years now, and I haven't seen any problems due to that.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo