Why is the cluster trying to scale up?

gbe · November 10, 2025, 4:53pm

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

Ray version: 2.51
Python version: 3.11
OS: Linux
Cloud/Infrastructure: Kubernetes
Other libs/tools (if relevant):

3. What happened vs. what you expected:

I am running a training on an autoscalable cluster. The training itself doesn’t require many resources but the autoscaler keeps requesting more and more cpus. The consequence is that a lot of workers are created but they are idling.

Actual: I have many pre-processing workers being created and idling for no clear reason
Expected: no autoscaling at all

I am mostly interested in understanding why the autoscaler is trying to bring more CPU online despite no real need?
It keeps growing the CPU request over time:

Resized to 1084 CPUs, 1 GPUs.
Resized to 1212 CPUs, 1 GPUs.
... etc

RunLLM · November 10, 2025, 4:55pm

This is a known issue with Ray Data and the autoscaler: Ray Data can trigger excessive autoscaling, causing the cluster to keep adding CPU nodes even when they are not needed, resulting in many idle workers. This happens because Ray Data’s autoscaling logic may launch many tasks (e.g., during repartition or map operations) and aggressively requests CPUs, sometimes not properly backpressuring or limiting the number of concurrent tasks, which leads to over-provisioning of resources. The autoscaler reacts to these resource requests, not actual CPU utilization, so it keeps scaling up as long as it sees resource requests—even if the tasks themselves are not using the CPUs efficiently or are already completed. This behavior is documented in several open issues and discussions, and there is currently no complete fix; the main workaround is to cap the maximum number of CPU nodes in your cluster configuration or adjust your data pipeline to avoid triggering excessive parallelism (e.g., by limiting partitions or concurrency) (Ray Data/LLM Non stop CPU autoscaling with vLLM, Ray Data scaleup unlimited resources when processing data, Ray keeps adding nodes beyond Dataset.map concurrency).

The autoscaler does not scale based on actual CPU usage but on resource requests generated by Ray tasks and actors. If your pipeline or Ray Data job creates many tasks or partitions, the autoscaler interprets this as a need for more CPUs, regardless of real utilization. To prevent this, you can set a hard limit on the number of worker nodes or CPUs in your Ray cluster config, and consider tuning your data pipeline to reduce the number of concurrent tasks or blocks (Ray Cluster Key Concepts). Would you like more detail on how to adjust your pipeline or cluster config to mitigate this?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Autoscaling not working with ray.util.multiprocessing Kubernetes	5	822	June 17, 2021
Autoscaler scales cluster up and down all the time RLlib	6	502	May 12, 2021
Autoscaling behavior Ray Clusters	1	357	September 13, 2021
Autoscaling is very slow and not working correctly Ray Clusters	6	652	April 30, 2021
Autoscaler launches extra nodes Ray Clusters	0	389	June 14, 2023

Why is the cluster trying to scale up?

Related topics