Dynamically create/terminate serve deployments based on available capacity

prabalakanti · May 2, 2022, 6:14am

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity

Hi,

I have a use case where models get dynamically created and need to be served in the available memory of a cluster.

Example:
My cluster has 32 GB RAM available which is sufficient to serve approx. 10 models. So I can create 10 Ray serve deployments around them. If a new model needs to be served, I first need to remove one of the existing 10 deployments and then deploy the newly arrived model.

I was wondering if the “tree of actors” pattern can be used to tackle this. Can I create a supervisor deployment/actor which exposes functions that actually dynamically create/upscale/downscale/delete deployments in a running cluster? Are there any side effects to creating deployments from within another actor?

Is there a standard pattern to achieve this in Ray Serve which I am missing?

architkulkarni · June 6, 2022, 10:23pm

Hi @prabalakanti, you can certainly manage deployments from within another actor or deployment!

architkulkarni · June 14, 2022, 7:46pm

To give more details, there are no side effects from creating deployments within another actor. For your supervisor actor, if you don’t plan to scale it up using num_replicas, it might be simpler to just create it as a Ray Actor instead of a Deployment.

Topic		Replies	Views
Ray serve with dynamic deployments	0	582	September 23, 2022
Adding actors dynamically without cluster restart Ray Core	6	458	June 21, 2021
Dynamically serve new model via Ray Serve Ray Serve	5	71	June 11, 2025
Dynamic Deployment on Ray Serve Ray Serve	3	127	March 4, 2025
Difference between serve.deployment vs ray.remote? Ray Serve	5	1634	June 5, 2023

Dynamically create/terminate serve deployments based on available capacity

Related topics