How to connect to workers in subnetworks

Reed_Pan · May 4, 2022, 3:59am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hi, everyone! I am trying to set up the Distributed Ray following manual-ray-cluster-setup.

I have successfully brought up the head node (A) and one worker (B), but having trouble in bring up workers in subnetworks (C1, C2).

Here is my network setup:

I run ray start --head --port=6379 to set up head and ray start --address=0.0.0.0:6379 to connect. The thing is after running the ray start in Worker C, the ray dashboard indicates that Worker C is connected. But the IP address is its local IP (1.0.0.0) rather than the external IP (0.0.0.2:1).

When I run ray status to check out, the Worker C node is not listed. Moreover, when I try to allocate some tasks to Worker C, the process stuck in pending. I think it might be an IP issue.

I have tried to use ray start --head --port=6379 --node-ip-address 0.0.0.2:1 to connect, but got the warning Unable to connect to GCS at 0.0.0.2:1:6379.

I have struggled with these issues for days, any information will be highly appreciated!

sangcho · May 4, 2022, 11:29pm

https://docs.ray.io/en/latest/ray-core/configure.html#ray-ports

Did you make sure all ports are properly open?

Reed_Pan · May 4, 2022, 11:47pm

Thanks for the timely reply!

I am sure that worker C1 can reach head A, and head A can ssh to worker C1 through its IP(0.0.0.2:1).

But the thing is worker C1 opens a port on its local subnetworks (1.0.0.0) and that’s why head A cannot communicate with worker C1.

Is it possible to solve this by manual configuration of ray or port forward?

Dmitri · May 12, 2022, 3:32am

worker C1 opens a port on its local subnetworks (1.0.0.0)

@sangcho @jianxiao do you know what port that would be and if there’s way to configure ray start for the worker to do the right thing?

Topic		Replies	Views
Unable to connect to head node Ray Clusters	4	781	July 12, 2022
Having trouble connecting to head node Ray Clusters	14	5959	April 27, 2022
Failed to set up Ray cluster Ray Clusters	3	229	June 4, 2024
Unable to manually start ray cluster Ray Core	2	775	April 26, 2021
Ray cluster worker port Ray Clusters	9	926	December 8, 2023

How to connect to workers in subnetworks

Related topics