One large Ray cluster vs set of specialized clusters

blublinsky · March 24, 2021, 8:44pm

What is the best deployment practice - one large Ray cluster with dedicated resource groups or a set of smaller dedicated clusters. The latter seems to be better from the point of view maintainability, but brings up a question of communication between different cluster. The former makes communications simpler but brings the question of dynamically adding specific resource groups with custom libraries installations.
So what is the official answer?

Dmitri · March 24, 2021, 9:25pm

Depends on the particular application details, but generally I think the first setup is better – one Ray cluster per application concern.

@eoakes would you agree?

eoakes · March 25, 2021, 1:28am

Yes, unless you have a strong reason for sharing a cluster (like rapid scaling up and down of different applications), I’d suggest multiple clusters.

eoakes · March 25, 2021, 1:28am

Note that there is some support in the works for automatically handling specifying dependencies per-task/actor: [RFC] runtime_env for actors and tasks · Issue #14019 · ray-project/ray · GitHub

blublinsky · March 25, 2021, 1:45am

Thanks @Dmitri, is it possible, in this case to add dynamically specific resource groups to the running cluster?

blublinsky · March 25, 2021, 3:24am

Thanks guys, so how can I communicate between clusters? Submit a task from cluster A to cluster B?

mbehrendt · March 25, 2021, 10:16am

this is an interesting thread. When chatting with @rliaw about this, i think he described it as 'if it’s a smaller workload, it would make sense to dynamically bring the ray cluster up and down. But if it’s a larger workload (and/or one that is running continuously), then you’d rather have a longer-running larger shared cluster.
From a pure architectural perspective, it feels like the “one cluster per workload” is a bit cleaner, but it might come with the downside of additional overhead. Also, you’d have to implement some code on the client side that brings up the cluster in itself - which also results in some delay until the execution starts. So some trade-offs to consider, and discuss which of them are predominantly relevant.

wdyt?

blublinsky · March 25, 2021, 2:24pm

I like the idea of dedicated clusters, but I was thinking more along the lines of dedicated clusters, not necessarily dynamic clusters creation. The question is how do I submit from clusterA to clusterB?

Topic		Replies	Views
[Cluster] Multiple programs running on one ray cluster Kubernetes	9	2263	September 9, 2024
Some questions about Ray on Kubernetes Ray Clusters	3	762	December 3, 2021
What is the rationale for recommending one worker per k8s node Kubernetes	3	177	August 6, 2024
Ray on k8s, how to properly config head node Ray Clusters	4	876	June 24, 2022
Cluster multiple providers Ray Clusters	1	67	October 4, 2024

One large Ray cluster vs set of specialized clusters

Related topics