[slurm-users] ec2 elastic node

Chris Samuel chris at csamuel.org
Sat Mar 17 06:33:21 MDT 2018


On Thursday, 15 March 2018 6:04:47 PM AEDT Arie Blumenzweig wrote:

> # sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> cloud*       up   infinite      1  down* slurm-node0

It looks like Slurm thinks the node was booted, but cannot talk to it.

> [2018-03-13T15:38:21.401] debug2: Error connecting slurm stream socket at
> 172.31.38.99:6818: Connection timed out

Did it possibly boot with that IP address but slurmd was blocked by a firewall?

I've not played with the cloud stuff for a long time but you may need to try:

scontrol update node=slurm-node0 state=POWER_DOWN

to see if that gets it back into its offline state properly to allow it to try 
and by booted again.

Good luck!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




More information about the slurm-users mailing list