How to connect to workers in subnetworks

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi, everyone! I am trying to set up the Distributed Ray following manual-ray-cluster-setup.

I have successfully brought up the head node (A) and one worker (B), but having trouble in bring up workers in subnetworks (C1, C2).

Here is my network setup:

I run ray start --head --port=6379 to set up head and ray start --address=0.0.0.0:6379 to connect. The thing is after running the ray start in Worker C, the ray dashboard indicates that Worker C is connected. But the IP address is its local IP (1.0.0.0) rather than the external IP (0.0.0.2:1).

When I run ray status to check out, the Worker C node is not listed. Moreover, when I try to allocate some tasks to Worker C, the process stuck in pending. I think it might be an IP issue.

I have tried to use ray start --head --port=6379 --node-ip-address 0.0.0.2:1 to connect, but got the warning Unable to connect to GCS at 0.0.0.2:1:6379.

I have struggled with these issues for days, any information will be highly appreciated!

https://docs.ray.io/en/latest/ray-core/configure.html#ray-ports

Did you make sure all ports are properly open?

Thanks for the timely reply!

I am sure that worker C1 can reach head A, and head A can ssh to worker C1 through its IP(0.0.0.2:1).

But the thing is worker C1 opens a port on its local subnetworks (1.0.0.0) and that’s why head A cannot communicate with worker C1.

Is it possible to solve this by manual configuration of ray or port forward?

worker C1 opens a port on its local subnetworks (1.0.0.0)

@sangcho @jianxiao do you know what port that would be and if there’s way to configure ray start for the worker to do the right thing?