How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
Hi,
I have a use case where models get dynamically created and need to be served in the available memory of a cluster.
Example:
My cluster has 32 GB RAM available which is sufficient to serve approx. 10 models. So I can create 10 Ray serve deployments around them. If a new model needs to be served, I first need to remove one of the existing 10 deployments and then deploy the newly arrived model.
I was wondering if the “tree of actors” pattern can be used to tackle this. Can I create a supervisor deployment/actor which exposes functions that actually dynamically create/upscale/downscale/delete deployments in a running cluster? Are there any side effects to creating deployments from within another actor?
Is there a standard pattern to achieve this in Ray Serve which I am missing?