Dynamically create/terminate serve deployments based on available capacity

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

Hi,

I have a use case where models get dynamically created and need to be served in the available memory of a cluster.

Example:
My cluster has 32 GB RAM available which is sufficient to serve approx. 10 models. So I can create 10 Ray serve deployments around them. If a new model needs to be served, I first need to remove one of the existing 10 deployments and then deploy the newly arrived model.

I was wondering if the “tree of actors” pattern can be used to tackle this. Can I create a supervisor deployment/actor which exposes functions that actually dynamically create/upscale/downscale/delete deployments in a running cluster? Are there any side effects to creating deployments from within another actor?

Is there a standard pattern to achieve this in Ray Serve which I am missing?

Hi @prabalakanti, you can certainly manage deployments from within another actor or deployment!

To give more details, there are no side effects from creating deployments within another actor. For your supervisor actor, if you don’t plan to scale it up using num_replicas, it might be simpler to just create it as a Ray Actor instead of a Deployment.