[slurm-users] Dynamic Node Shrinking/Expanding for Running Jobs in Slurm
Chris Samuel
chris at csamuel.org
Wed Jun 28 18:33:19 UTC 2023
On 28/6/23 04:02, Rahmanpour Koushki, Maysam wrote:
> Upon reviewing the current FAQ, I found that it states node shrinking is
> only possible for pending jobs. Unfortunately, it does not provide
> additional information or examples to clarify if this functionality can
> be extended to running jobs.
You can definitely release nodes from a running job, what I believe the
FAQ is saying is you cannot do something like change the number of cores
per node or memory you requested once a job is running.
As for why you'd do that, we've had people who (before we set up a
mechanism to automatically reboot nodes to address this) would request
more nodes than they needed, look for how fragmented kernel hugepages
were and then exclude nodes where there were too many fragmented for
their needs.
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users
mailing list