[slurm-users] Cloud Scheduling Cluster Size Limit

Rupert Madden-Abbott rupert.madden.abbott at gmail.com
Fri Sep 18 10:29:27 UTC 2020


Hi,

The Cloud Scheduling Guide [1] recommends setting the TreeWidth to the
maximum cluster size to disable hierarchical communications. The maximum
TreeWidth value is 65533. Does this effectively mean that cloud clusters
are limited to 65533 nodes? What is the expected behaviour if I run a cloud
cluster where NODE_COUNT > TREE_WIDTH?

When new cloud nodes are added to the cluster, I must update the controller
with the IP/hostnames of those nodes. Therefore, it seems like it should be
possible for the controller to distribute that information across all
existing nodes (using hierarchical communications) and all new nodes (using
flattened communications). Hierarchical communications could then be used
for the existing cluster.

Are they any plans to add this to Slurm and are there any workarounds that
might allow me to achieve this right now? For example, can I manually
update all of the compute nodes in the cluster with information on all
other nodes as new cloud nodes are added?



[1]: https://slurm.schedmd.com/elastic_computing.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200918/c68d831e/attachment.htm>


More information about the slurm-users mailing list