[slurm-users] Burst to AWS cloud
Brian Andrus
toomuchit at gmail.com
Tue Dec 15 20:14:08 UTC 2020
I have done that for several clients.
1. Staging data is a pain. The simplest thing was to have it as part of
the job script, or have the job itself be dependent upon a separate
staging job. Where bandwidth is an issue, we have implemented bbcp
2. Depending on size and connectivity, you can use hosts files or
create a subdomain for the cluster nodes. I prefer the latter. Just
use static IPs for your cloud nodes. You do need to ensure
connectivity with networks, etc. of course
3. SchedMD has the info on cloud nodes:
https://slurm.schedmd.com/elastic_computing.html
4. Try to isolate everything you use so it isn't overly dependent on
some other groups services (eg: DNS, authentication, etc) unless you
can be aware of any changes they are making so aren't surprised.
Also, avoid network mounts on nodes. Performance takes a big hit
when you have that going over a direct-connect or VPN.
Brian Andrus
On 12/15/2020 12:02 PM, Sajesh Singh wrote:
>
> We are currently investigating the use of the cloud scheduling
> features within an on-site Slurm installation and was wondering if
> anyone had any experiences that they wish to share of trying to use
> this feature. In particular I am interested to know:
>
> https://slurm.schedmd.com/elastic_computing.html
> <https://slurm.schedmd.com/elastic_computing.html>
>
> 1) Recommendations for staging the data that was needed by the nodes
> in cloud
>
> 2) How did you handle name resolution
>
> 3) Any resources/documentation in particular that proved helpful while
> setting up the environment
>
> 4) Any bits of advise or horror stories that may be helpful in
> avoiding pitfalls.
>
> Regards,
>
> -SS-
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201215/25684f8f/attachment.htm>
More information about the slurm-users
mailing list