[slurm-users] Setting up a reactivity margin with SLURM

Corentin Mercier corentin.a.mercier at inria.fr
Mon May 23 08:03:38 UTC 2022


Hello, 

I am currently trying to make energy savings on a cluster running SLURM. 

I read the Power Saving guide and I found exactly what I am looking for : SuspendTime. It allows me to shut nodes down after a certain idle time. 
However, I want to go further by keeping a small amount of nodes idle in certain partitions in order to allow small jobs to run instantly. 
For short, I want to keep a reactivity margin on certain partitions. 

In the documentation, I saw that it's possible to exclude given nodes from shutting down but I want that list to be dynamic and to keep a certain amount of nodes idle. 
Here's an example : 
On partition A, there should always be 5 idle nodes available to new clients. As clients come, those idle nodes become allocated and new nodes need to be started in order to replace them (they'll stay idle until allocated). 
I would need to wake some nodes and update the exclusion list so they're staying idle . 

As someone else could have faced the same issue, I went on SLURM's GitHub to check the available plugins there but I couldn't find any that implement a dynamic reactivity margin. 

So, is there a plugin that implements such mechanism ? Or should I work with the Suspend/ResumeProgram scripts to update the SuspendExcNodes list by hand ? 
I'd be glad to hear any other existing solution too. 

Regards, 
C.Mercier 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220523/a7c98192/attachment.htm>


More information about the slurm-users mailing list