I’m not understanding why Ray is trying to upscale when I’ve explicitly set my deployment to have a max replicas of 1. Can someone explain this to me? I’m running this locally.
Here’s the deployment:
@serve.deployment(
max_queued_requests=100,
ray_actor_options={"num_gpus": 1.0, "num_cpus": 2.0},
autoscaling_config=AutoscalingConfig(
min_replicas=0,
max_replicas=1,
idle_timeout_minutes=5,
upscale_delay_s=1.0,
),
logging_config=LoggingConfig(log_level="ERROR"),
)
And the warning:
- Deployment 'Predictor' in application 'app1' has 9 replicas that have taken more than 30s to be scheduled.