Production best practices for Ray Serve

smitkiri-klaviyo · January 30, 2023, 6:44pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hey folks, I’m looking for some guidance on production best practices. To give some context, I have a bunch of independent models, deployed on different endpoints using DAGDriver.bind({"/model1": Model1.bind(), "/model2": Model2.bind(), …})

Looking at this documentation, if I make code changes to one of these models, a new cluster is started and the previous one is terminated requiring the Kubernetes cluster to be large enough to schedule both Ray Clusters. Having to teardown and setup all model deployments when changes are made to just one seems unnecessary to me, and it also raises scalability concerns if the Kubernetes cluster isn’t large enough.

Is there a better way to approach this to avoid teardown + setup of all deployments?

Huaiwei_Sun · January 31, 2023, 5:03pm

cc: @Sihan_Wang @shrekris for ideas

shrekris · January 31, 2023, 6:20pm

Hi @smitkiri-klaviyo, welcome the forums!

If you bind your models into a single graph, and you issue a code update to one of them, this will cause a rolling update to all the models. Graphs are deployed as a unit, and their upgrades happen as a unit.

Serve has recently added experimental support for running multiple apps on a single Ray cluster. You can see the RFC here, and the recent changes here. We plan to add support for this in the REST API in the coming weeks. This will let you use this support with KubeRay RayServices.

If you use the nightly version of Ray, you can launch multiple apps on a single Ray cluster by running serve.run(your_graph, name=…) where name is the application name. You can split each of your models into a separate applications, and when you upgrade an application, only that model will be upgraded. Please note that this API is still experimental and may change in the future.

smitkiri-klaviyo · January 31, 2023, 11:09pm

Thank you! If I understand correctly, binding all models to a single graph is the only way to deploy multiple models on a single endpoint, right? But it sounds like it’s being addressed in the RFC, really looking forward to it.

With multiple applications on a single ray cluster, are there any concerns / guidance on how many applications we can / should run on a single ray cluster?

shrekris · January 31, 2023, 11:27pm

Yeah, currently binding all the models into a single graph is the only way to deploy them onto a single Ray cluster.

After the RFC is addressed, you should be able to split each model into its own graph and deploy all the graphs onto a single Ray cluster. Each graph would be independently upgradeable.

With multiple applications on a single ray cluster, are there any concerns / guidance on how many applications we can / should run on a single ray cluster?

Generally, for production, we’d recommend each Ray cluster support a single use case. If all your models are totally independent, it may be better to run them on different clusters for better fault tolerance and isolation. If you’d prefer, you should still be able to run them on a single cluster without hitting any scaling limits. However, if the cluster goes down, this would affect all the deployments running on it.

chaolin · August 15, 2023, 8:15am

Hello @shrekris，

I’m also concerned about this issue. If I create different ray clusters of different applications, each of them only run at 1 or 2 Replicas. There would be hundreds of clusters in one K8S cluster.

Is it recommended to run one small application in a ray clusters? Is there a more appropriate way to use in this case？

Thank you so much！

shrekris · August 15, 2023, 3:32pm

Hi @chaolin, at this point we’ve introduced multi-app support, so multiple Serve apps can run in a single Ray cluster. Since you seem to be running many small apps, you can use multi-app to pack them into one cluster.

Topic		Replies	Views
Running 10+ models on a ray cluster Kubernetes	1	576	February 27, 2022
Automating the serving of many different models Ray Serve	8	1700	May 3, 2023
Ray serve on Kubernetes Ray Serve	14	938	March 27, 2024
How to run multiple deployments in ray serve 2.0 Ray Serve	10	2424	December 13, 2022
Dynamic Deployment on Ray Serve Ray Serve	3	168	March 4, 2025

Production best practices for Ray Serve

Related topics