Hi,
My ray version is 1.9.0
I have encountered similar issue with this one: How to run tune on multiple machines? · Issue #9931 · ray-project/ray · GitHub
I ran ray start --head
on server A. It pops out following message:
Local node IP: 192.168.173.153
--------------------
Ray runtime started.
--------------------
Next steps
To connect to this Ray runtime from another node, run
ray start --address='192.168.173.153:6379' --redis-password='5241590000000000'
Alternatively, use the following Python code:
import ray
ray.init(address='auto', _redis_password='5241590000000000')
To connect to this Ray runtime from outside of the cluster, for example to
connect to a remote cluster from your laptop directly, use the following
Python code:
import ray
ray.init(address='ray://<head_node_ip_address>:10001')
If connection fails, check your firewall settings and network configuration.
To terminate the Ray runtime, run
ray stop
which works fine.
But when I run ray start --address='192.168.173.153:6379' --redis-password='5241590000000000'
on server B, it throws:
RuntimeError: Unable to connect to Redis at 192.168.173.153:6379 after 16 retries. Check that 192.168.173.153:6379 is reachable from this machine. If it is not, your firewall may be blocking this port. If the problem is a flaky connection, try setting the environment variable `RAY_START_REDIS_WAIT_RETRIES` to increase the number of attempts to ping the Redis server.
I try to ssh server A via ssh 192.168.173.153
on server B it working properly.