Autoscaling with `max_concurrent_queries = 1`

spolcyn · May 12, 2022, 2:29pm

How severe does this issue affect your experience of using Ray?

Low: It annoys or frustrates me for a moment.

Is it possible to use Ray Serve autoscaling with the deployment configured to have max_concurrent_queries = 1?

My read of the source with the error_ratio computation is that this isn’t possible, but it’d be helpful if there is a way to do this (likely related to [Feature] Autoscaling Based on Full Request Queue · Issue #20977 · ray-project/ray · GitHub).

simon-mo · May 13, 2022, 5:50pm

Nice find. I do think this is a bug/missing configuration here. The autoscaling_policy reads:

    # Example: if error_ratio == 2.0, we have two times too many ongoing
    # requests per replica, so we desire twice as many replicas.
    error_ratio: float = (
        num_ongoing_requests_per_replica
        / autoscaling_config.target_num_ongoing_requests_per_replica
    )

    # Multiply the distance to 1 by the smoothing ("gain") factor (default=1).
    smoothed_error_ratio = 1 + ((error_ratio - 1) * autoscaling_config.smoothing_factor)
    desired_num_replicas = math.ceil(current_num_replicas * smoothed_error_ratio)

When num_ongoing_requests_per_replica=1 and target=1, the smoothed_error_ratio=1 then desired_num_replicas stay as 1.

Can you please open a github issue so we can track this as a blocker for graduating the autoscaling feature? Thanks

spolcyn · May 13, 2022, 8:55pm

Done! [Serve] Can't autoscale deployment when target ongoing requests is 1 · Issue #24793 · ray-project/ray · GitHub

Topic		Replies	Views
How to ensure ray serve using max replicas possible Ray Serve	3	608	October 19, 2023
Ray serve autoscaling queue size Ray Serve	5	1341	May 24, 2022
Ray Serve is executing the requests sequentially instead parallel even after configuring auto-scale Ray Serve	11	849	October 20, 2023
Ray serve deployment is not scaling up, ongoing request is always 0 Ray Serve	1	316	April 18, 2024
Ray autoscaling despite hard limit on number of replicas	1	47	December 6, 2024

Autoscaling with `max_concurrent_queries = 1`

Related topics