How to create a distributed cluster

  • High: It blocks me to complete my task.

Hi, I’m new to Ray and exploring a scenario where computers from different regions are added to a cluster.

I have an AWS server that helps me set up a reverse tunnel, along with two Linux machines in different regions for starting the head node and worker node. The head node runs on Linux1, while the worker node runs on Linux2.

First, I created a head node on Linux1 using the following command:

ray start --head --port=7799

Next, I set up a reverse tunnel on the server so that when accessing port 7002 on the server, the traffic is forwarded to the head node application on Linux1 at port 7799.

Then, on Linux2, I used the following command to connect to the cluster:

ray start --address=server.com:7002

However, after connecting to the cluster, it disconnects after a while. Alternatively, when I open the dashboard on Linux1, I find that the newly added node is marked as dead. Additionally, it is not possible to submit tasks to the cluster from Linux2.

Hey @Dunty_Z - cross region Ray Clusters aren’t recommended/supported; if you want resiliency would stick to just zonal.

Hi hi @Sam_Chan , if I have already created a cluster, can I use this cluster remotely to execute my tasks?

If it’s single Zone ofc; long running Ray Clusters are a pretty common pattern. It’s when you try to spread your Cluster across multiple Cloud regions where things break down.

1 Like