I used ray up ray/python/ray/autoscaler/gcp/defaults.yaml
to set up a cluste on GCP and start ray.
I was able to use ray attach ray/python/ray/autoscaler/gcp/example-full.yaml
and then use ray on the remote node (using ray.init(address=‘auto’)).
However, I could not connect to the head node from my local machine by specifying the IP address. For example:
kipnisal@AlonKs-MBP 18:47:24 ~/Ray_tests/PhaseDiagram % ray get-head-ip ray/python/ray/autoscaler/gcp/defaults.yaml
35.197.30.171
kipnisal@AlonKs-MBP 18:47:57 ~/Ray_tests/PhaseDiagram % ray status --address=35.197.30.171:6379
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/redis/connection.py", line 559, in connect
sock = self._connect()
File "/usr/local/lib/python3.8/site-packages/redis/connection.py", line 615, in _connect
raise err
File "/usr/local/lib/python3.8/site-packages/redis/connection.py", line 603, in _connect
sock.connect(socket_address)
TimeoutError: [Errno 60] Operation timed out
Hey @rliaw,
Thank you for your answer. Although I am a bit confuse about why an additional setup is needed to communicate with the cluster.
Perhaps I should ask a more basic question. Say that ‘experiment.py’ contains
@ray.remote
def evaluate_iteration(par) :
return run_experiment(par)
res = [evaluate_iteration.remote(par) for par in params]
return_value = ray.get(res)
After setting up an autoscale cluster using ray up cluster.yaml
I can use ray submit cluster.yaml experiment.py
to run experiment.py on the ray cluster.
Is it possible to have experiment.py run on my local machine and submitting Ray task to the cluster configured by cluster.yaml?
Hey @kipnisal , the reason why additional setup is needed is due to a particular implementation detail of Ray (specifically, that the driver worker process needs bidirectional network connectivity with the ray services).
Thus, you need the ray client to work around that limitation. Specifically:
import ray
ray.init(address="...")
@ray.remote
def evaluate_iteration(par) :
return run_experiment(par)
res = [evaluate_iteration.remote(par) for par in params]
return_value = ray.get(res)
will not work if you want to run it on your local machine and submit to the Ray cluster configured by yaml.
However,
import ray
ray.util.connect("...:10001")
@ray.remote
def evaluate_iteration(par) :
return run_experiment(par)
res = [evaluate_iteration.remote(par) for par in params]
return_value = ray.get(res)
will work (you will be able to run this code on your local machine and submit these ray remote tasks to the cluster.