What is the correct way to connect to a Ray cluster?

mauricio · December 15, 2020, 5:03pm

HI,

This might be simple but I couldn’t really find in the documentation on what’s the proper way to connect to a deployed cluster. (I’ve successfully deployed two different clusters in GCP, one on GKE and the other on VMs with the provided yaml files)

I see that you can use commands to trigger a one-off run script via ray submit, and that I can ssh into the instances and run simple python commands there. But what is the recommended way to connect and keep a stable connection to a cluster that I can use as a ray.util.multiprocessing.Pool?

I’ve tried creating a ssh-tunnel and using pool = Pool(ray_address="127.0.0.1:6379") but this besides not being stable it gets me a lot of connection timeouts.

What would be the best way to expose and connect to my ray cluster from outside and run processes there? What is that that I’m missing here?

Thanks in advance

Mauricio

rliaw · December 16, 2020, 7:34pm

hey @mauricio,

unfortunately you’ll have to run your scripts on the cluster head node for now! We’re currently working on a client interface for your ideal workflow (with the ssh tunnel), but this is still work in progress.

ray submit and ray rsync-up are my current typical commands for execution.

mauricio · December 17, 2020, 4:32pm

Thank you @rliaw - Any ETA of when this client interface will be available?

rliaw · December 17, 2020, 5:21pm

Roughly 2 or 3 more months? You can track development here: https://github.com/ray-project/ray/milestone/12

mauricio · December 18, 2020, 2:42am

Awesome, thank you very much.

Maltimore · January 19, 2021, 2:34pm

Hi @rliaw,
I think I ran into the same problem. Is it documented somewhere that you must run your scripts on the cluster head node? I only found out randomly via this thread.

If it is really the case that you need to run your script from the head node, then that would also explain this long-standing github issue: Redis has started but no raylets have registered yet. · Issue #8152 · ray-project/ray · GitHub

Also, in the documentation it says: “To run a distributed Ray program, you’ll need to execute your program on the same machine as one of the nodes.”, i.e. not restricted to running it on the head node.

https://docs.ray.io/en/master/cluster/index.html#manual-cluster

rliaw · January 25, 2021, 8:14am

No, you can run it on any node actually. The long-standing github issue (specifically, this example) doesn’t actually run the Ray script properly. For containerized settings, the script must be run within the same container as where you call ray start.

Happy to answer any more questions you might have, though I think we should move this discussion to another thread!

will · January 27, 2021, 9:44pm

Ray client is now usable! @mauricio

https://docs.ray.io/en/master/ray-client.html

import ray
import ray.util

ray.util.connect("0.0.0.0:50051")  # replace with the appropriate host and port

# Normal Ray code follows
@ray.remote
def f(x):
    return x ** x

do_work.remote(2)
#....

mauricio · January 28, 2021, 2:32pm

@will that’s great news!

thank you very much. I’ll give it a try.

Mauricio

Topic		Replies	Views
How to connect to Ray cluster? Kubernetes	7	768	July 13, 2021
How do you connect Ray Client to a cluster managed by a coordinator server? Ray Clusters	0	330	July 30, 2021
Problem connecting to GCP cluster Ray Clusters	2	78	September 17, 2024
Problems lauching gcp cluster Ray Core	4	729	July 7, 2022
Ray status no process right after runtime comes up	0	419	July 30, 2021

What is the correct way to connect to a Ray cluster?

Related topics