[slurm-users] Slurm cloud scheduling/power saving
toomuchit at gmail.com
Thu Apr 1 16:57:48 UTC 2021
Run 'sinfo -R' to see if any of your nodes are out of the mix.
If so, resume them and see if things work.
On 4/1/2021 1:53 AM, Steve Brasier wrote:
> Hi all, anyone have suggestions for debugging cloud nodes not
> resuming? I've had this working before but I'm now using "configless"
> mode so wondering if that's an issue.
> If I login as SlurmUser and run the ResumeProgram manually, the
> specified node(s) boot, and if I log into them `sinfo` works although
> it only shows the "static" nodes, not the newly booted "cloud" nodes.
> So that at least shows the program works, the image works, and new
> nodes can contact the slurmctld.
> However if I run a job which requires cloud nodes it immediately goes
> Pending showing "Nodes required for job are DOWN, DRAINED or reserved
> for jobs in higher priority partitions". Looking at SlurmctldLogFile
> with SlurmdDebug=debug5 I don't see any attempt to boot the nodes at
> all :-(.
> I can post slurm.conf if anyone wants to look but I think the
> important parameters are probably that I've got:
> That look right?
> thanks for any suggestions!
> http://stackhpc.com/ <http://stackhpc.com/>
> Please note I work Tuesday to Friday.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users