[slurm-users] [External] Re: Internet connection loss with srun to a node

Prentice Bisbal pbisbal at pppl.gov
Mon Aug 3 19:22:54 UTC 2020


Not necessarily. If the cluster is on a private network, some node that 
connects to the public network needs to be configured to act as a NAT 
gateway to forward traffic meant for the outside world. This doesn't 
happen automatically. Some cluster admins intentionally don't do this 
for security and bandwidth issues. Other times it's merely an act of 
omission.

Allow access *out* to the internet doesn't allow bitcoin mining and 
such. Allowing traffic from the outside *in* to the cluster is what 
causes that kind of shenanigans.

Prentice

On 8/2/20 5:13 PM, Brian Andrus wrote:
>
> This is very likely by design of the cluster and/or network. Otherwise 
> users could use the cluster to mine bitcoin and such.
>
> Brian Andrus
>
> On 8/2/2020 7:11 AM, Mahmood Naderan wrote:
>> I thought that maybe srun doesn't transfer all settings from the head 
>> node to the compute node.
>> The wget command works on frontend but doesn't work on the compute.
>>
>> mahmood at main-proxy:~$ wget google.com <http://google.com>
>> --2020-08-02 16:05:55-- http://google.com/
>> Resolving google.com <http://google.com> (google.com 
>> <http://google.com>)... 216.58.215.238, 2a00:1450:400a:800::200e
>> Connecting to google.com <http://google.com> (google.com 
>> <http://google.com>)|216.58.215.238|:80... connected.
>> HTTP request sent, awaiting response... 301 Moved Permanently
>> Location: http://www.google.com/ [following]
>> --2020-08-02 16:05:55-- http://www.google.com/
>> Resolving www.google.com <http://www.google.com> (www.google.com 
>> <http://www.google.com>)... 172.217.168.68, 2a00:1450:400a:803::2004
>> Connecting to www.google.com <http://www.google.com> (www.google.com 
>> <http://www.google.com>)|172.217.168.68|:80... connected.
>> HTTP request sent, awaiting response... 200 OK
>> Length: unspecified [text/html]
>> Saving to: ‘index.html’
>>
>> index.html                         [ <=>                             
>>     ]  12.68K  --.-KB/s    in 0s
>>
>> 2020-08-02 16:05:56 (196 MB/s) - ‘index.html’ saved [12983]
>>
>> mahmood at main-proxy:~$ srun -p gpu_part --gres=gpu:titanv:1  --pty 
>> /bin/bash
>> mahmood at fry0:~$ wget google.com <http://google.com>
>> --2020-08-02 16:05:30-- http://google.com/
>> Resolving google.com <http://google.com> (google.com 
>> <http://google.com>)... 216.58.215.238, 2a00:1450:400a:800::200e
>> Connecting to google.com <http://google.com> (google.com 
>> <http://google.com>)|216.58.215.238|:80... ^C
>> mahmood at fry0:~$
>>
>>
>>
>>
>> I will check the gateway with the admin.
>> Thanks for the hint.
>>
>>
>>
>> Regards,
>> Mahmood
>>
>>
>>
>>
>> On Sun, Aug 2, 2020 at 5:58 PM Renfro, Michael <Renfro at tntech.edu 
>> <mailto:Renfro at tntech.edu>> wrote:
>>
>>     Probably unrelated to slurm entirely, and most likely has to do
>>     with lower-level network diagnostics. I can guarantee that it’s
>>     possible to access Internet resources from a compute node. Notes
>>     and things to check:
>>
>>     1. Both ping and http/https are IP protocols, but are very
>>     different (ping isn’t even TCP or UDP, it’s ICMP), so even if you
>>     needed proxy variables for http and https to work, they shouldn’t
>>     affect ping.
>>
>>     2. Do http or https transfers work from a compute node? A github
>>     clone, a test with curl or wget to a nearby web server? Do your
>>     proxy variables exist on the compute node, and most importantly,
>>     is there a proxy server listening and functional on the host and
>>     port that the variables point to?
>>
>>     3. What’s the default gateway for your compute nodes? Does that
>>     gateway provide network address translation (NAT) for the nodes,
>>     or does it work as a traditional router?
>>
>>     Get Outlook for iOS <https://aka.ms/o0ukef>
>>     ------------------------------------------------------------------------
>>     *From:* slurm-users <slurm-users-bounces at lists.schedmd.com
>>     <mailto:slurm-users-bounces at lists.schedmd.com>> on behalf of
>>     Mahmood Naderan <mahmood.nt at gmail.com <mailto:mahmood.nt at gmail.com>>
>>     *Sent:* Sunday, August 2, 2020 7:52:52 AM
>>     *To:* Slurm User Community List <slurm-users at lists.schedmd.com
>>     <mailto:slurm-users at lists.schedmd.com>>
>>     *Subject:* [slurm-users] Internet connection loss with srun to a
>>     node
>>     Hi
>>     A frontend machine is connected to the internet and from that
>>     machine, I use srun to get a bash on another node. But it seems
>>     that the node is unable to access the internet. The http_proxy
>>     and https_proxy are defined in ~/.bashrc
>>
>>     mahmood at main-proxy:~$ ping google.com <http://google.com>
>>     PING google.com <http://google.com> (216.58.215.238) 56(84) bytes
>>     of data.
>>     64 bytes from zrh11s02-in-f14.1e100.net
>>     <http://zrh11s02-in-f14.1e100.net> (216.58.215.238): icmp_seq=1
>>     ttl=114 time=1.38 ms
>>     ^C
>>     --- google.com <http://google.com> ping statistics ---
>>     1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>     rtt min/avg/max/mdev = 1.384/1.384/1.384/0.000 ms
>>     mahmood at main-proxy:~$ srun -p gpu_part --gres=gpu:titanv:1  --pty
>>     /bin/bash
>>     mahmood @fry0:~$ ping google.com <http://google.com>
>>     PING google.com <http://google.com> (216.58.215.238) 56(84) bytes
>>     of data.
>>     ^C
>>     --- google.com <http://google.com> ping statistics ---
>>     3 packets transmitted, 0 received, 100% packet loss, time 2026ms
>>
>>
>>
>>     I guess that is related to slurm and srun.
>>     Any idea for that?
>>
>>
>>
>>
>>
>>
>>
>>     Regards,
>>     Mahmood
>>
>>
-- 
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200803/8883a605/attachment-0001.htm>


More information about the slurm-users mailing list