Dynamically create/terminate serve deployments based on available capacity

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity


I have a use case where models get dynamically created and need to be served in the available memory of a cluster.

My cluster has 32 GB RAM available which is sufficient to serve approx. 10 models. So I can create 10 Ray serve deployments around them. If a new model needs to be served, I first need to remove one of the existing 10 deployments and then deploy the newly arrived model.

I was wondering if the “tree of actors” pattern can be used to tackle this. Can I create a supervisor deployment/actor which exposes functions that actually dynamically create/upscale/downscale/delete deployments in a running cluster? Are there any side effects to creating deployments from within another actor?

Is there a standard pattern to achieve this in Ray Serve which I am missing?