[slurm-users] Internet connection loss with srun to a node

Rodrigo Santibáñez rsantibanez.uchile at gmail.com
Sun Aug 2 22:11:14 UTC 2020


Hi Mahmood,

I had the same problem some time ago when I took control of the cluster and
wanted to update the OS. While the login node has access to internet, the
calculation nodes are connected to the login node and is a private network.
Therefore, the only machine with access to the internet was the login node,
unless you configure each calculation node to use as a proxy the login
node. I believe I followed the instructions in this webpage to configure
the calculation nodes (CentOS7)

https://forums.centos.org/viewtopic.php?f=16&t=8583&start=10

Good luck configuring your machines.

Best regards

El dom., 2 ago. 2020 a las 17:15, Brian Andrus (<toomuchit at gmail.com>)
escribió:

> This is very likely by design of the cluster and/or network. Otherwise
> users could use the cluster to mine bitcoin and such.
>
> Brian Andrus
> On 8/2/2020 7:11 AM, Mahmood Naderan wrote:
>
> I thought that maybe srun doesn't transfer all settings from the head node
> to the compute node.
> The wget command works on frontend but doesn't work on the compute.
>
> mahmood at main-proxy:~$ wget google.com
> --2020-08-02 16:05:55--  http://google.com/
> Resolving google.com (google.com)... 216.58.215.238,
> 2a00:1450:400a:800::200e
> Connecting to google.com (google.com)|216.58.215.238|:80... connected.
> HTTP request sent, awaiting response... 301 Moved Permanently
> Location: http://www.google.com/ [following]
> --2020-08-02 16:05:55--  http://www.google.com/
> Resolving www.google.com (www.google.com)... 172.217.168.68,
> 2a00:1450:400a:803::2004
> Connecting to www.google.com (www.google.com)|172.217.168.68|:80...
> connected.
> HTTP request sent, awaiting response... 200 OK
> Length: unspecified [text/html]
> Saving to: ‘index.html’
>
> index.html                         [ <=>
>             ]  12.68K  --.-KB/s    in 0s
>
> 2020-08-02 16:05:56 (196 MB/s) - ‘index.html’ saved [12983]
>
> mahmood at main-proxy:~$ srun -p gpu_part --gres=gpu:titanv:1  --pty
> /bin/bash
> mahmood at fry0:~$ wget google.com
> --2020-08-02 16:05:30--  http://google.com/
> Resolving google.com (google.com)... 216.58.215.238,
> 2a00:1450:400a:800::200e
> Connecting to google.com (google.com)|216.58.215.238|:80... ^C
> mahmood at fry0:~$
>
>
>
>
> I will check the gateway with the admin.
> Thanks for the hint.
>
>
>
> Regards,
> Mahmood
>
>
>
>
> On Sun, Aug 2, 2020 at 5:58 PM Renfro, Michael <Renfro at tntech.edu> wrote:
>
>> Probably unrelated to slurm entirely, and most likely has to do with
>> lower-level network diagnostics. I can guarantee that it’s possible to
>> access Internet resources from a compute node. Notes and things to check:
>>
>> 1. Both ping and http/https are IP protocols, but are very different
>> (ping isn’t even TCP or UDP, it’s ICMP), so even if you needed proxy
>> variables for http and https to work, they shouldn’t affect ping.
>>
>> 2. Do http or https transfers work from a compute node? A github clone, a
>> test with curl or wget to a nearby web server? Do your proxy variables
>> exist on the compute node, and most importantly, is there a proxy server
>> listening and functional on the host and port that the variables point to?
>>
>> 3. What’s the default gateway for your compute nodes? Does that gateway
>> provide network address translation (NAT) for the nodes, or does it work as
>> a traditional router?
>>
>> Get Outlook for iOS <https://aka.ms/o0ukef>
>> ------------------------------
>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
>> Mahmood Naderan <mahmood.nt at gmail.com>
>> *Sent:* Sunday, August 2, 2020 7:52:52 AM
>> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
>> *Subject:* [slurm-users] Internet connection loss with srun to a node
>>  Hi
>> A frontend machine is connected to the internet and from that machine, I
>> use srun to get a bash on another node. But it seems that the node is
>> unable to access the internet. The http_proxy and https_proxy are defined
>> in ~/.bashrc
>>
>> mahmood at main-proxy:~$ ping google.com
>> PING google.com (216.58.215.238) 56(84) bytes of data.
>> 64 bytes from zrh11s02-in-f14.1e100.net (216.58.215.238): icmp_seq=1
>> ttl=114 time=1.38 ms
>> ^C
>> --- google.com ping statistics ---
>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> rtt min/avg/max/mdev = 1.384/1.384/1.384/0.000 ms
>> mahmood at main-proxy:~$ srun -p gpu_part --gres=gpu:titanv:1  --pty
>> /bin/bash
>> mahmood  @fry0:~$ ping google.com
>> PING google.com (216.58.215.238) 56(84) bytes of data.
>> ^C
>> --- google.com ping statistics ---
>> 3 packets transmitted, 0 received, 100% packet loss, time 2026ms
>>
>>
>>
>> I guess that is related to slurm and srun.
>> Any idea for that?
>>
>>
>>
>>
>>
>>
>>
>> Regards,
>> Mahmood
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200802/e32fa770/attachment-0001.htm>


More information about the slurm-users mailing list