Multi-model batch inference - problem with scaling of actors

krzwaraksa · August 14, 2025, 12:37pm

Hello everyone, we’re building a multi-model batch inference pipeline and so far, we’re enjoying Ray a lot! However, I’ve noticed one problem with auto scaling when using stateful maps (with Actors). Our pipeline looks as follows:

read_parquet → Preprocess (load image etc., on CPU) → annotate (on GPU) → preprocess → annotate → … → write_parquet

Say we have 3 models in this pipeline and we want to use autoscaling, so we set concurrency to (1,16) for every map operator. Ray will then in the beginning assign many GPUs to the first annotator and less for 2nd and 3rd. This will use the resources fully, so scaling algorithm will be satisfied. But the pipeline will be unbalanced: first annotator will output much more records than 2nd and 3rd can consume. Backpressure won’t be triggered, because the rows are very small and don’t take much space in the object store, and every worker has a long queue of blocks to process. But if the job crashes, all the progress of the first annotators will be lost, because Ray doesn’t have resuming feature.

Is there a way to configure Ray to scale down some workers in this situation, or is such a feature planned?

1. Severity of the issue:
Low: Annoying but doesn’t hinder my work.

2. Environment:

Ray version: 2.48.0
Python version: 3.10.11
OS: Linux
Cloud/Infrastructure: 4 nodes x 4xA100s
Other libs/tools (if relevant): -

3. What happened vs. what you expected:

Expected: Ray scales down actors if a pipeline is unbalanced
Actual: Ray never scales down actors as long as they’re busy and there’s space in object store.

christina · August 18, 2025, 10:14pm

So after reading some similar discussions in the Ray GitHub, Ray’s actor pool autoscaling is sensitive to transient usage, and does not consider usage over time intervals or downstream backpressure, which can cause the scaling behavior you’ve seen. There are open discussions about making the autoscaler more robust and responsive to such pipeline imbalances, but as of Ray 2.48.0, this feature is not available… I think there are some workarounds but I don’t know off the top of my head.

It does seem like definitely an area of improvement though.

Discussions I talked about:

krzwaraksa · August 19, 2025, 7:36am

Thanks for your answer, my problem is a bit different than in the GitHub threads you posted, in the first one the problem is that it’s scaling up and down too much, and in the second, some resources are not used and for some reason Ray can’t create more actors. My problem is that all resources are utilized fully, and it can’t scale down, which causes imbalance and the work will not be saved in case of a crash.

Topic		Replies	Views
Multi-stage fanning pipeline using Ray: Queues + Actors vs. Workflows Ray Core	3	995	April 22, 2022
Ray Data AutoScaler, scale down very slow Ray Data	1	35	May 13, 2026
Correctly sizing preprocessing Actor in Ray data Ray Data	3	138	June 26, 2024
Backpressure with ActorPool (or alternatives?) Ray Core	1	754	August 3, 2021
Ray not autoscaling with Ray Data and submitted using jobs api	0	20	September 4, 2025

Multi-model batch inference - problem with scaling of actors

Related topics