How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
Is it possible to use Ray Serve autoscaling with the deployment configured to have max_concurrent_queries = 1
?
My read of the source with the error_ratio
computation is that this isn’t possible, but it’d be helpful if there is a way to do this (likely related to [Feature] Autoscaling Based on Full Request Queue · Issue #20977 · ray-project/ray · GitHub).
Nice find. I do think this is a bug/missing configuration here. The autoscaling_policy reads:
# Example: if error_ratio == 2.0, we have two times too many ongoing
# requests per replica, so we desire twice as many replicas.
error_ratio: float = (
num_ongoing_requests_per_replica
/ autoscaling_config.target_num_ongoing_requests_per_replica
)
# Multiply the distance to 1 by the smoothing ("gain") factor (default=1).
smoothed_error_ratio = 1 + ((error_ratio - 1) * autoscaling_config.smoothing_factor)
desired_num_replicas = math.ceil(current_num_replicas * smoothed_error_ratio)
When num_ongoing_requests_per_replica=1
and target=1
, the smoothed_error_ratio=1
then desired_num_replicas
stay as 1
.
Can you please open a github issue so we can track this as a blocker for graduating the autoscaling feature? Thanks
1 Like