How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi,
I tried to established a “simple” ray cluster by connecting two aws machines using following commands:
# machine1 designated as head_node
ray start --head
# it starts successfully and prints machine1_ip:6379
# on machine2
ray start --address='machine1_ip:6379'
Local node IP: machine2_ip
2023-01-14 20:56:15,350 WARNING utils.py:1346 -- Unable to connect to GCS at machine1_ip:6379. Check that (1) Ray GCS with matching version started successfully at the specified address, and (2) there is no firewall setting preventing access.
I got this known error (Launching an On-Premise Cluster — Ray 2.2.0) and follow the troubleshoot nc -vv -z $HEAD_ADDRESS $PORT
which shows successful connection. Replaced machine1_ip
with domain name is not working either. Both machines can ssh connect to each other using public key without typing in password. Also, they are running matching python-3.10.0 and ray-2.1.0.
What else should I do to nail down the failing point? I appreciate anyone’s help!