[slurm-users] Burst to AWS cloud

Sajesh Singh ssingh at amnh.org
Tue Dec 15 20:23:13 UTC 2020


Brian,
  Thank you for the info. Will definitely keep you recommendations handy while putting this together.

-SS-


From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Brian Andrus
Sent: Tuesday, December 15, 2020 3:14 PM
To: slurm-users at lists.schedmd.com
Subject: Re: [slurm-users] Burst to AWS cloud

EXTERNAL SENDER


I have done that for several clients.

  1.  Staging data is a pain. The simplest thing was to have it as part of the job script, or have the job itself be dependent upon a separate staging job. Where bandwidth is an issue, we have implemented bbcp
  2.  Depending on size and connectivity, you can use hosts files or create a subdomain for the cluster nodes. I prefer the latter. Just use static IPs for your cloud nodes. You do need to ensure connectivity with networks, etc. of course
  3.  SchedMD has the info on cloud nodes: https://slurm.schedmd.com/elastic_computing.html<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Felastic_computing.html&data=04%7C01%7Cssingh%40amnh.org%7Cd05731e539ad4bf2087608d8a13607a0%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637436600713667076%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=At8q53Xb1%2FNFdkZViinfX%2FPEHq6%2Fgz%2FyRUC8G2shwOY%3D&reserved=0>
  4.  Try to isolate everything you use so it isn't overly dependent on some other groups services (eg: DNS, authentication, etc) unless you can be aware of any changes they are making so aren't surprised. Also, avoid network mounts on nodes. Performance takes a big hit when you have that going over a direct-connect or VPN.

Brian Andrus


On 12/15/2020 12:02 PM, Sajesh Singh wrote:
We are currently investigating the use of the cloud scheduling features within an on-site Slurm installation and was wondering if anyone had any experiences that they wish to share of trying to use this feature. In particular I am interested to know:

https://slurm.schedmd.com/elastic_computing.html<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Felastic_computing.html&data=04%7C01%7Cssingh%40amnh.org%7Cd05731e539ad4bf2087608d8a13607a0%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637436600713677073%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Q80f%2BjejDl3fc3qDfYAwhbIgjUg7KzXbcfQwPgsRnCo%3D&reserved=0>

1)  Recommendations for staging the data that was needed by the nodes in cloud
2) How did you handle name resolution
3) Any resources/documentation in particular that proved helpful while setting up the environment
4) Any bits of advise or horror stories that may be helpful in avoiding pitfalls.


Regards,

-SS-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201215/06c543c7/attachment.htm>


More information about the slurm-users mailing list