My ray version is 1.9.0
I have encountered similar issue with this one: How to run tune on multiple machines? · Issue #9931 · ray-project/ray · GitHub
ray start --head on server A. It pops out following message:
Local node IP: 192.168.173.153 -------------------- Ray runtime started. -------------------- Next steps To connect to this Ray runtime from another node, run ray start --address='192.168.173.153:6379' --redis-password='5241590000000000' Alternatively, use the following Python code: import ray ray.init(address='auto', _redis_password='5241590000000000') To connect to this Ray runtime from outside of the cluster, for example to connect to a remote cluster from your laptop directly, use the following Python code: import ray ray.init(address='ray://<head_node_ip_address>:10001') If connection fails, check your firewall settings and network configuration. To terminate the Ray runtime, run ray stop
which works fine.
But when I run
ray start --address='192.168.173.153:6379' --redis-password='5241590000000000' on server B, it throws:
RuntimeError: Unable to connect to Redis at 192.168.173.153:6379 after 16 retries. Check that 192.168.173.153:6379 is reachable from this machine. If it is not, your firewall may be blocking this port. If the problem is a flaky connection, try setting the environment variable `RAY_START_REDIS_WAIT_RETRIES` to increase the number of attempts to ping the Redis server.
I try to ssh server A via
ssh 192.168.173.153 on server B it working properly.