Ray Data AutoScaler, scale down very slow

Hi team :waving_hand: Question on Ray Data streaming executor’s autoscaler behavior.

When an upstream op fully finishes (no more inputs, all tasks drained), its resources
seem to be released gradually step-by-step rather than all at once. Downstream ops then
wait many scale-down ticks before they can scale up — really painful for CPU-preprocess
→ GPU-inference pipelines, where GPUs sit idle while CPU actors drain.

Is the step-wise scale-down intentional even after an op has fully completed, or would a
one-shot release on op completion be acceptable (perhaps behind a DataContext flag)?

Currently on Ray 2.15.2 / Python 3.12, happy to retest on latest.
Full details: [Data/Autoscaler] Proposal: Release all resources of an upstream op at once when it finishes, instead of step-wise scale down · Issue #63299 · ray-project/ray · GitHub

cc anyone who owns this area :folded_hands:

The current Ray Data autoscaler behavior is to scale down resources for an upstream operator gradually, step-by-step, even after the operator has fully finished processing. This can cause downstream operators (e.g., GPU inference) to wait unnecessarily for resources to be freed, leading to pipeline stalls and underutilization. This step-wise scale-down is the default and is not explicitly documented as intentional; there is an open proposal to release all resources at once upon op completion to improve throughput and resource handoff, but this is not yet implemented or configurable via a DataContext flag as of the latest discussions and codebase (see issue discussion).

If you are on Ray 2.15.2, this behavior is present, and the community is considering changes to make resource release more immediate after op completion. There is no indication in the data sources that a one-shot release is currently available or configurable, but the proposal is open for feedback and may be adopted in future versions (see proposal details). Would you like a step-by-step breakdown of how the autoscaler currently handles resource release and why this behavior occurs?

Sources:

Hint: Mention @RunLLM in the post for followups.