While I am not sure of your specifics, you could easily add lines to your suspend/resume scripts to check/wait/etc if there are tasks waiting.

Brian Andrus


On 1/15/2024 12:22 AM, 김종록 wrote:

Hello.

I'm going to use Slurm's cloud feature in private cloud.

The problem is that the scale out/in of the instance is not simultaneous in my cloud.

This means that if there is a scale out/in trigger, no other work is done until the trigger is completed.

If so, the Suspend/Resume generated later must be started only when the previous work is completed, but the timeout is not known accurately.

Is there any way to limit Suspend/Resume request in Slurm?

As far as I know, there is a Suspend/ResumeRate, but this only limits the number of nodes per minute and does not limit concurrency.