We set SlurmdTimeout=600. The docs say not to go any higher than 65533 seconds:
https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdTimeout
The FAQ has info about SlurmdTimeout also. The worst thing that could happen is will take longer to set nodes as being down:
A node is set DOWN when the slurmd daemon on it stops responding for SlurmdTimeout as defined in slurm.conf.
https://slurm.schedmd.com/faq.html
I wouldn't set it too high, but too high vs too low will vary from site to site and how busy your controllers are and how busy your network is.
Regards --Mick ________________________________ From: Bjørn-Helge Mevik via slurm-users slurm-users@lists.schedmd.com Sent: Monday, February 12, 2024 7:16 AM To: slurm-users@schedmd.com slurm-users@schedmd.com Subject: [slurm-users] Re: Increasing SlurmdTimeout beyond 300 Seconds
We've been running one cluster with SlurmdTimeout = 1200 sec for a couple of years now, and I haven't seen any problems due to that.
-- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo