Ray autoscaling despite hard limit on number of replicas

Josiah_Reeves · December 5, 2024, 6:01pm

I’m not understanding why Ray is trying to upscale when I’ve explicitly set my deployment to have a max replicas of 1. Can someone explain this to me? I’m running this locally.

Here’s the deployment:

@serve.deployment(
    max_queued_requests=100,
    ray_actor_options={"num_gpus": 1.0, "num_cpus": 2.0},
    autoscaling_config=AutoscalingConfig(
        min_replicas=0,
        max_replicas=1,
        idle_timeout_minutes=5,
        upscale_delay_s=1.0,
    ),
    logging_config=LoggingConfig(log_level="ERROR"),
)

And the warning:
- Deployment 'Predictor' in application 'app1' has 9 replicas that have taken more than 30s to be scheduled.

cindy_zhang · December 6, 2024, 1:10am

Hi @Josiah_Reeves, can you paste the controller logs? Can you also paste the serve config? You can get the serve config by running curl http://localhost:8265/api/serve/applications/ if you’re running it locally, or if not locally then the app config from the Ray Dashboard would also help!

Topic		Replies	Views
Ray Serve replica level autoscaling not working with Kube deployment Ray Serve	3	29	June 11, 2025
Ray Serve is executing the requests sequentially instead parallel even after configuring auto-scale Ray Serve	11	847	October 20, 2023
How to ensure ray serve using max replicas possible Ray Serve	3	608	October 19, 2023
Ray serve deployment is not scaling up, ongoing request is always 0 Ray Serve	1	316	April 18, 2024
Autoscaling RayServe Pods in k8s keeps terminating and restarting pods Ray Serve	4	729	November 20, 2023

Ray autoscaling despite hard limit on number of replicas

Related topics