What is the correct way to connect to a Ray cluster?

HI,

This might be simple but I couldn’t really find in the documentation on what’s the proper way to connect to a deployed cluster. (I’ve successfully deployed two different clusters in GCP, one on GKE and the other on VMs with the provided yaml files)

I see that you can use commands to trigger a one-off run script via ray submit, and that I can ssh into the instances and run simple python commands there. But what is the recommended way to connect and keep a stable connection to a cluster that I can use as a ray.util.multiprocessing.Pool?

I’ve tried creating a ssh-tunnel and using pool = Pool(ray_address="127.0.0.1:6379") but this besides not being stable it gets me a lot of connection timeouts.

What would be the best way to expose and connect to my ray cluster from outside and run processes there? What is that that I’m missing here?

Thanks in advance

Mauricio

1 Like

hey @mauricio,

unfortunately you’ll have to run your scripts on the cluster head node for now! We’re currently working on a client interface for your ideal workflow (with the ssh tunnel), but this is still work in progress.

ray submit and ray rsync-up are my current typical commands for execution.

2 Likes

Thank you @rliaw - Any ETA of when this client interface will be available?

Roughly 2 or 3 more months? You can track development here: https://github.com/ray-project/ray/milestone/12

1 Like

Awesome, thank you very much.

Hi @rliaw,
I think I ran into the same problem. Is it documented somewhere that you must run your scripts on the cluster head node? I only found out randomly via this thread.

If it is really the case that you need to run your script from the head node, then that would also explain this long-standing github issue: Redis has started but no raylets have registered yet. · Issue #8152 · ray-project/ray · GitHub

Also, in the documentation it says: “To run a distributed Ray program, you’ll need to execute your program on the same machine as one of the nodes.”, i.e. not restricted to running it on the head node.

https://docs.ray.io/en/master/cluster/index.html#manual-cluster

No, you can run it on any node actually. The long-standing github issue (specifically, this example) doesn’t actually run the Ray script properly. For containerized settings, the script must be run within the same container as where you call ray start.

Happy to answer any more questions you might have, though I think we should move this discussion to another thread!

Ray client is now usable! @mauricio

https://docs.ray.io/en/master/ray-client.html

import ray
import ray.util

ray.util.connect("0.0.0.0:50051")  # replace with the appropriate host and port

# Normal Ray code follows
@ray.remote
def f(x):
    return x ** x

do_work.remote(2)
#....

@will that’s great news!

thank you very much. I’ll give it a try.

Mauricio