[slurm-users] [slurm 20.02.3] don't suspend nodes in down state

Jacek Budzowski j.budzowski at cyfronet.pl
Mon Aug 24 09:40:52 UTC 2020


Dear Herbert,

In our installation we also had this problem.
Unfortunately we didn't find more elegant solution than change in Slurm
code (and recompiling slurmctld).
Here is the patch we use to prevent DOWN nodes to be suspended:

diff --git a/src/slurmctld/power_save.c b/src/slurmctld/power_save.c
index 1f8d77c..752b404 100644
--- a/src/slurmctld/power_save.c
+++ b/src/slurmctld/power_save.c
@@ -368,7 +368,7 @@ static void _do_power_work(time_t now)
                /* Suspend nodes as appropriate */
                if ((susp_state ==
0)                                   &&
                    ((suspend_rate == 0) || (suspend_cnt <
suspend_rate)) &&
-                   (IS_NODE_IDLE(node_ptr) ||
IS_NODE_DOWN(node_ptr))  &&
+                   (IS_NODE_IDLE(node_ptr))                           
 &&
                    (node_ptr->sus_job_cnt ==
0)                        &&
                    (!IS_NODE_COMPLETING(node_ptr))                    
 &&
                    (!IS_NODE_POWER_UP(node_ptr))                      
 &&


Best regards,
Jacek Budzowski
W dniu pon, 24.08.2020 o godzinie 08∶52 +0000, użytkownik Steininger,
Herbert napisał:
> Hi,
> 
> how can I prevent slurm, to suspend nodes, which I have set to down
> state for maintenance?
> I know about "SuspendExcNodes", but this doesn't seem the right way,
> to roll out the slurm.conf every time this changes.
> Is there a state that I can set so that the nodes doesn't get
> suspended?
> 
> It happened a few times that I was doing some stuff on a server and
> after our idle time (1h) slurm decided to suspend the node.
> 
> TIA,
> Herbert
> 
-- 

Jacek Budzowski
System administrator
ACC Cyfronet AGH
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200824/2051acc6/attachment.htm>


More information about the slurm-users mailing list