[slurm-users] slurm elastic compute / power saving

Brian Andrus toomuchit at gmail.com
Tue Jan 7 21:19:08 UTC 2020


I think we would need to see your SuspendScript to get a better idea of 
what is happening.

That error indicates the nodes are likely not running slurmd and the 
control daemon things they are still up.

What is the output of 'sinfo -R'?

Brian Andrus

On 1/7/2020 3:42 AM, Steve Brasier wrote:
> Hi all,
>
> I've got elastic compute working with slurm but on "suspend" I get 
> something like the following in the slurmcltd log:
>
> power down request repeating for node compute-2
> power down request repeating for node compute-3
> error: Nodes compute-[2-3] not responding
>
> The docs say that the SuspendScript should only have to return the 
> nodes to the cloud - but the above suggests that maybe the script 
> should also notify the slurmctld that the nodes are offline? Is that 
> right, and if so what state should they be set to?
>
> many thanks
> Steve



More information about the slurm-users mailing list