[slurm-users] Dynamic Node Shrinking/Expanding for Running Jobs in Slurm

Chris Samuel chris at csamuel.org
Wed Jun 28 18:33:19 UTC 2023


On 28/6/23 04:02, Rahmanpour Koushki, Maysam wrote:

> Upon reviewing the current FAQ, I found that it states node shrinking is 
> only possible for pending jobs. Unfortunately, it does not provide 
> additional information or examples to clarify if this functionality can 
> be extended to running jobs.

You can definitely release nodes from a running job, what I believe the 
FAQ is saying is you cannot do something like change the number of cores 
per node or memory you requested once a job is running.

As for why you'd do that, we've had people who (before we set up a 
mechanism to automatically reboot nodes to address this) would request 
more nodes than they needed, look for how fragmented kernel hugepages 
were and then exclude nodes where there were too many fragmented for 
their needs.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




More information about the slurm-users mailing list