How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I created a cluster on aws using the template from here, the only change which is relevant to this topic (but I don’t think it’s the root cause) is the following:
provider:
type: aws
// xxx
cache_stopped_nodes: True
+ security_group:
+ GroupName: test_ray_client_security_group
+ IpPermissions:
+ - FromPort: 10001
+ ToPort: 10001
+ IpProtocol: TCP
// xxx
With the yaml, I created a Ray cluster from my laptop using the following command and got the head node IP address (suppose the public IP is 3.3.3.3
and the private IP 10.0.0.20
)
ray up example-full.yml
Then in my laptop, I tried to connect with the cluster using the code below but it doesn’t work:
from ray.job_submission import JobSubmissionClient
client = JobSubmissionClient("http://3.3.3.3:8265")
I did some checks on my security group setup but no luck. What troubles me now is seems like even inside the cluster head node (the security group accepts all traffic within the security group), I can’t connect to the cluster other than the loopback address:
This works (on the head node 10.0.0.20, out side of the container):
client = JobSubmissionClient("http://127.0.0.1:8265")
This doesn’t (on the head node 10.0.0.20, out side of the container):
client = JobSubmissionClient("http://10.0.0.20:8265")
ConnectionError: Failed to connect to Ray at address: http://10.0.0.20:8265
Digging a little deeper, it seems port 6379 (ray gcs address) bind to all available IP addresses, but port 8265 (ray web ui/dashboard) only accepts connection from the loopback address:
(base) ubuntu@ip-10-0-0-20:~$ sudo netstat -tlnp | grep 6379
tcp6 0 0 :::6379 :::* LISTEN 7012/gcs_server
(base) ubuntu@ip-10-0-0-20:~$ sudo netstat -tlnp | grep 8265
tcp 0 0 127.0.0.1:8265 0.0.0.0:* LISTEN 7088/python
Port forwarding mentioned here should work, but seems not programatic enough, I was expecting something simple as this.
Is there any configuration which lets the dashboard process listen to all available IP address? Or any other setups needed to access the job submission API outside of the cluster?