[slurm-users] Burst to AWS cloud

Tue Dec 15 20:14:08 UTC 2020

I have done that for several clients.

 1. Staging data is a pain. The simplest thing was to have it as part of
    the job script, or have the job itself be dependent upon a separate
    staging job. Where bandwidth is an issue, we have implemented bbcp
 2. Depending on size and connectivity, you can use hosts files or
    create a subdomain for the cluster nodes. I prefer the latter. Just
    use static IPs for your cloud nodes. You do need to ensure
    connectivity with networks, etc. of course
 3. SchedMD has the info on cloud nodes:
    https://slurm.schedmd.com/elastic_computing.html
 4. Try to isolate everything you use so it isn't overly dependent on
    some other groups services (eg: DNS, authentication, etc) unless you
    can be aware of any changes they are making so aren't surprised.
    Also, avoid network mounts on nodes. Performance takes a big hit
    when you have that going over a direct-connect or VPN.

Brian Andrus

On 12/15/2020 12:02 PM, Sajesh Singh wrote:
>
> We are currently investigating the use of the cloud scheduling 
> features within an on-site Slurm installation and was wondering if 
> anyone had any experiences that they wish to share of trying to use 
> this feature. In particular I am interested to know:
>
> https://slurm.schedmd.com/elastic_computing.html 
> <https://slurm.schedmd.com/elastic_computing.html>
>
> 1)  Recommendations for staging the data that was needed by the nodes 
> in cloud
>
> 2) How did you handle name resolution
>
> 3) Any resources/documentation in particular that proved helpful while 
> setting up the environment
>
> 4) Any bits of advise or horror stories that may be helpful in 
> avoiding pitfalls.
>
> Regards,
>
> -SS-
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201215/25684f8f/attachment.htm>