Autoscaler launches extra nodes

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Ray 2.4.0, python 3.10, cluster on GCP
I create a GCP cluster with next parameters:
1 head node with 1 CPU, workers with 2 CPU, 0 min workers, 5 max workers, upscaling_speed 1.0 idle_timeout_minutes 1
I deploy Ray Serve application on this cluster with next parameters:

deployments:

  • name: DLModelProcessor
    autoscaling_config:
    min_replicas: 0
    initial_replicas: 0
    max_replicas: 5
    target_num_ongoing_requests_per_replica: 1.0
    metrics_interval_s: 10.0
    look_back_period_s: 30.0
    smoothing_factor: 1.0
    downscale_delay_s: 120.0
    upscale_delay_s: 30.0
    ray_actor_options:
    num_cpus: 2.0

  • name: Backend
    autoscaling_config:
    min_replicas: 1
    initial_replicas: 1
    max_replicas: 1
    target_num_ongoing_requests_per_replica: 100.0
    metrics_interval_s: 10.0
    look_back_period_s: 30.0
    smoothing_factor: 1.0
    downscale_delay_s: 600.0
    upscale_delay_s: 30.0
    ray_actor_options:
    num_cpus: 1.0

When I send request, I expect that 1 DLModelProcessor replica will be created, so 1 new worker node will be launched and the replica will start working there. But after launching 1 new node (for 3 minutes), the launching of another new node starts, at some time I have 2 worker nodes, and after 1 minute the second is terminating because it is idle.
I have seen issue about race condition in node launcher, but advice from there (set provider: foreground_node_launch: True) have not helped me.
When I set both provider: foreground_node_launch: True and enviroment variable AUTOSCALER_UPDATE_INTERVAL_S=10 this problem seems to be solved (5 out of 5 times the only 1 nodes was launched). Just one of this options does not help.
Here is an extract from monitor.log when I send a request to serve with extra node provisioning.

Why do I have this behavier? And how my problem is supposed to be solved correctly?

1 Like