Deploy a model which scales down to 0 replicas

plum9 · March 28, 2022, 4:34pm

Hi,
I want to create a function that deploys an ML model with autoscaling policies only when the load is present. The autoscaling policies currently don’t allow 0 as min_replicas. I want to bring the model down when the load is not present for a given period of time. Is there a way to do it Ray serve?

jiaodong · March 28, 2022, 5:54pm

I think you’re asking for scale to zero that we’re active looking into and scoping in Q2. If you have any particular asks or description of your workload it will be very helpful to file a feature request with context in Issues · ray-project/ray · GitHub and tag us.

plum9 · March 29, 2022, 2:27am

@jiaodong Yes, you’re right. I want to scale a model down to 0 when there’s no load. I’ll file a feature request, thanks!

Topic		Replies	Views
Autoscaling Replicas in Ray Serve Ray Serve	5	1702	March 12, 2021
Ray Serve Autoscaling: Autoscaling backend-replicas removed? Ray Serve	3	494	February 18, 2021
Scale Multiple Ray Serve Deployments Proportionally Ray Serve	0	94	May 14, 2024
Ray autoscaling despite hard limit on number of replicas	1	47	December 6, 2024
Ray serve scale down strategy Ray Serve	3	496	February 24, 2022

Deploy a model which scales down to 0 replicas

Related topics