I started up a ray cluster by following the gcp full example. cluster comes up fine:
ray up cluster.yml
....
....
--------------------
Ray runtime started.
--------------------
Next steps
To connect to this Ray runtime from another node, run
ray start --address='xxxx:6379' --redis-password='xxxx'
Alternatively, use the following Python code:
import ray
ray.init(address='auto', _redis_password='xxx')
If connection fails, check your firewall settings and network configuration.
To terminate the Ray runtime, run
ray stop
Shared connection to xxx closed.
2021-07-30 20:37:25,729 INFO node_provider.py:21 -- wait_for_compute_zone_operation: Waiting for operation operation-1627677447407-5c85d300fc8b5-b5496585-6dd22005 to finish...
2021-07-30 20:37:31,244 INFO node_provider.py:33 -- wait_for_compute_zone_operation: Operation operation-1627677447407-5c85d300fc8b5-b5496585-6dd22005 finished.
New status: up-to-date
Useful commands
Monitor autoscaling with
ray exec xxxxcluster.yml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor*'
Connect to a terminal on the cluster head:
ray attach xxxx/cluster.yml
Get a remote shell to the cluster manually:
ssh -tt -o IdentitiesOnly=yes -i /root/.ssh/xxx ubuntu@xxx docker exec -it ray_container /bin/bash
First off ray is handing me back GCP internal IP and not the public IP of the node.
Second immediately after getting this output I run this command and get this unexpected output:
root@730a84ac15f8:/workspaces/taskengine# ray status
Traceback (most recent call last):
File "/usr/local/bin/ray", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/ray/scripts/scripts.py", line 1923, in main
return cli()
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/ray/scripts/scripts.py", line 1487, in status
address = services.get_ray_address_to_use_or_die()
File "/usr/local/lib/python3.9/site-packages/ray/_private/services.py", line 220, in get_ray_address_to_use_or_die
find_redis_address_or_die())
File "/usr/local/lib/python3.9/site-packages/ray/_private/services.py", line 232, in find_redis_address_or_die
raise ConnectionError(
ConnectionError: Could not find any running Ray instance. Please specify the one to connect to by setting `address`.
Can you please guide me why ray status is not being provided to local host?