- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Ray 2.4.0, python 3.10, cluster on GCP
I create a GCP cluster with next parameters:
1 head node with 1 CPU, workers with 2 CPU, 0 min workers, 5 max workers, upscaling_speed 1.0 idle_timeout_minutes 1
I deploy Ray Serve application on this cluster with next parameters:
deployments:
-
name: DLModelProcessor
autoscaling_config:
min_replicas: 0
initial_replicas: 0
max_replicas: 5
target_num_ongoing_requests_per_replica: 1.0
metrics_interval_s: 10.0
look_back_period_s: 30.0
smoothing_factor: 1.0
downscale_delay_s: 120.0
upscale_delay_s: 30.0
ray_actor_options:
num_cpus: 2.0 -
name: Backend
autoscaling_config:
min_replicas: 1
initial_replicas: 1
max_replicas: 1
target_num_ongoing_requests_per_replica: 100.0
metrics_interval_s: 10.0
look_back_period_s: 30.0
smoothing_factor: 1.0
downscale_delay_s: 600.0
upscale_delay_s: 30.0
ray_actor_options:
num_cpus: 1.0
When I send request, I expect that 1 DLModelProcessor replica will be created, so 1 new worker node will be launched and the replica will start working there. But after launching 1 new node (for 3 minutes), the launching of another new node starts, at some time I have 2 worker nodes, and after 1 minute the second is terminating because it is idle.
I have seen issue about race condition in node launcher, but advice from there (set provider: foreground_node_launch: True) have not helped me.
When I set both provider: foreground_node_launch: True and enviroment variable AUTOSCALER_UPDATE_INTERVAL_S=10 this problem seems to be solved (5 out of 5 times the only 1 nodes was launched). Just one of this options does not help.
Here is an extract from monitor.log when I send a request to serve with extra node provisioning.
Why do I have this behavier? And how my problem is supposed to be solved correctly?