I want to create a function that deploys an ML model with autoscaling policies only when the load is present. The autoscaling policies currently don’t allow 0 as min_replicas. I want to bring the model down when the load is not present for a given period of time. Is there a way to do it Ray serve?
I think you’re asking for scale to zero that we’re active looking into and scoping in Q2. If you have any particular asks or description of your workload it will be very helpful to file a feature request with context in Issues · ray-project/ray · GitHub and tag us.
@jiaodong Yes, you’re right. I want to scale a model down to 0 when there’s no load. I’ll file a feature request, thanks!