Ray Data AutoScaler, scale down very slow

allendang · May 13, 2026, 3:29am

Hi team Question on Ray Data streaming executor’s autoscaler behavior.

When an upstream op fully finishes (no more inputs, all tasks drained), its resources
seem to be released gradually step-by-step rather than all at once. Downstream ops then
wait many scale-down ticks before they can scale up — really painful for CPU-preprocess
→ GPU-inference pipelines, where GPUs sit idle while CPU actors drain.

Is the step-wise scale-down intentional even after an op has fully completed, or would a
one-shot release on op completion be acceptable (perhaps behind a DataContext flag)?

Currently on Ray 2.15.2 / Python 3.12, happy to retest on latest.
Full details: [Data/Autoscaler] Proposal: Release all resources of an upstream op at once when it finishes, instead of step-wise scale down · Issue #63299 · ray-project/ray · GitHub

cc anyone who owns this area

RunLLM · May 13, 2026, 3:30am

The current Ray Data autoscaler behavior is to scale down resources for an upstream operator gradually, step-by-step, even after the operator has fully finished processing. This can cause downstream operators (e.g., GPU inference) to wait unnecessarily for resources to be freed, leading to pipeline stalls and underutilization. This step-wise scale-down is the default and is not explicitly documented as intentional; there is an open proposal to release all resources at once upon op completion to improve throughput and resource handoff, but this is not yet implemented or configurable via a DataContext flag as of the latest discussions and codebase (see issue discussion).

If you are on Ray 2.15.2, this behavior is present, and the community is considering changes to make resource release more immediate after op completion. There is no indication in the data sources that a one-shot release is currently available or configurable, but the proposal is open for feedback and may be adopted in future versions (see proposal details). Would you like a step-by-step breakdown of how the autoscaler currently handles resource release and why this behavior occurs?

Sources:

Ray Data autoscaler proposal: release all resources at once

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Multi-model batch inference - problem with scaling of actors Ray Data	2	104	August 19, 2025
EC2 Autoscaler starts scaling down while scaling up	7	80	February 21, 2025
Autoscaling is very slow and not working correctly Ray Clusters	6	668	April 30, 2021
Autoscaler scale down slow Ray Core	3	351	March 1, 2021
Autoscaler scales cluster up and down all the time RLlib	6	517	May 12, 2021

Ray Data AutoScaler, scale down very slow

Related topics