I set up a GCP instance to serve as the head node/cluster coordinator to execute ray tasks. The idea was to use local PCs scattered throughout the world as worker nodes. After starting the head note (regular ray start --head
), I try to add my local PC with ray.init(address=<external_ip_of_gcp_instance>:6379, _redis_password=<password>)
, but the process hangs indefinitely at -- Connecting to existing Ray cluster at address: <external_ip_of_gcp_instance>:6379
. I tracked that the code gets stuck around global_state.get_node_to_connect_for_driver(node_ip_address)
(line 286 of services.py
). I did forward ports alright from the instance (inwards, TCP 6379). The same happens both on Windows and Linux.
I did also try to tunnel the [<internal_ip_of_gcp_instance>:]6379 port through ssh, passing then localhost:6379
as the address, but then I got redis.exceptions.ConnectionError: Error 10061
.
If I do the same things to ports 8265 and 10001, I can access the dashboard and the client service either from the external_ip and from the ssh tunnel through localhost. It seems the issue is somehow related to the Redis server setup.
Any help to make this work would be much appreciated.