How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
Is it possible to use Ray Serve autoscaling with the deployment configured to have
max_concurrent_queries = 1?
My read of the source with the
error_ratio computation is that this isn’t possible, but it’d be helpful if there is a way to do this (likely related to [Feature] Autoscaling Based on Full Request Queue · Issue #20977 · ray-project/ray · GitHub).
Nice find. I do think this is a bug/missing configuration here. The autoscaling_policy reads:
# Example: if error_ratio == 2.0, we have two times too many ongoing
# requests per replica, so we desire twice as many replicas.
error_ratio: float = (
# Multiply the distance to 1 by the smoothing ("gain") factor (default=1).
smoothed_error_ratio = 1 + ((error_ratio - 1) * autoscaling_config.smoothing_factor)
desired_num_replicas = math.ceil(current_num_replicas * smoothed_error_ratio)
desired_num_replicas stay as
Can you please open a github issue so we can track this as a blocker for graduating the autoscaling feature? Thanks