Cannot connect to head node from local machine

I used
ray up ray/python/ray/autoscaler/gcp/defaults.yaml
to set up a cluste on GCP and start ray.

I was able to use
ray attach ray/python/ray/autoscaler/gcp/example-full.yaml
and then use ray on the remote node (using ray.init(address=‘auto’)).

However, I could not connect to the head node from my local machine by specifying the IP address. For example:

kipnisal@AlonKs-MBP 18:47:24 ~/Ray_tests/PhaseDiagram % ray get-head-ip ray/python/ray/autoscaler/gcp/defaults.yaml
35.197.30.171
kipnisal@AlonKs-MBP 18:47:57 ~/Ray_tests/PhaseDiagram % ray status --address=35.197.30.171:6379
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/redis/connection.py", line 559, in connect
    sock = self._connect()
  File "/usr/local/lib/python3.8/site-packages/redis/connection.py", line 615, in _connect
    raise err
  File "/usr/local/lib/python3.8/site-packages/redis/connection.py", line 603, in _connect
    sock.connect(socket_address)
TimeoutError: [Errno 60] Operation timed out

@Dmitri, any idea why I can’t connect?

Hey @kipnisal , the local machine → head node will require you to use a special Ray Client connection:

https://docs.ray.io/en/master/ray-client.html#ray-client

Specifically, I think you just need to do:

ray.util.connect(<path_to_remote_node>:10001)
  1. Ray should automatically start the Client server process (so you don’t need to start the server manually)
  2. You may need to port forward 10001 from the remote node to your local machine, or expose it to the public internet on GCE.

Hey @rliaw,
Thank you for your answer. Although I am a bit confuse about why an additional setup is needed to communicate with the cluster.

Perhaps I should ask a more basic question. Say that ‘experiment.py’ contains

@ray.remote
def evaluate_iteration(par) :
   return run_experiment(par)

res = [evaluate_iteration.remote(par) for par in params] 
return_value = ray.get(res)

After setting up an autoscale cluster using
ray up cluster.yaml
I can use
ray submit cluster.yaml experiment.py
to run experiment.py on the ray cluster.

Is it possible to have experiment.py run on my local machine and submitting Ray task to the cluster configured by cluster.yaml?

Hey @kipnisal , the reason why additional setup is needed is due to a particular implementation detail of Ray (specifically, that the driver worker process needs bidirectional network connectivity with the ray services).

Thus, you need the ray client to work around that limitation. Specifically:

import ray
ray.init(address="...")

@ray.remote
def evaluate_iteration(par) :
   return run_experiment(par)

res = [evaluate_iteration.remote(par) for par in params] 
return_value = ray.get(res)

will not work if you want to run it on your local machine and submit to the Ray cluster configured by yaml.

However,

import ray
ray.util.connect("...:10001")

@ray.remote
def evaluate_iteration(par) :
   return run_experiment(par)

res = [evaluate_iteration.remote(par) for par in params] 
return_value = ray.get(res)

will work (you will be able to run this code on your local machine and submit these ray remote tasks to the cluster.

1 Like

I see. Thank you for all the help!