How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
My cluster is not removing idle workers. Here are two statuses a few minutes apart:
======== Autoscaler status: 2023-04-12 12:25:54.310061 ========
Node status
Healthy:
1 ray_head_default
1 ray_worker_on_demand_large
1 ray_worker_preemptible_small
Pending:
(no pending nodes)
Recent failures:
ray_worker_preemptible_small: RayletUnexpectedlyDied (ip: 10.128.0.36)
Resources
Usage:
0.0/40.0 CPU
0.00/141.962 GiB memory
1.02/61.156 GiB object_store_memory
Demands:
(no resource demands)
======== Autoscaler status: 2023-04-12 12:30:56.685294 ========
Node status
Healthy:
1 ray_head_default
1 ray_worker_on_demand_large
1 ray_worker_preemptible_small
Pending:
(no pending nodes)
Recent failures:
ray_worker_preemptible_small: RayletUnexpectedlyDied (ip: 10.128.0.36)
Resources
Usage:
0.0/40.0 CPU
0.00/141.962 GiB memory
1.02/61.156 GiB object_store_memory
Demands:
(no resource demands)
My config file specifies an idle timeout of 2 minutes. The cluster did seem to remove some other nodes that had been idling, but these two worker nodes are not being terminated. Any advice would be greatly appreciated!