Hello everyone, we’re building a multi-model batch inference pipeline and so far, we’re enjoying Ray a lot! However, I’ve noticed one problem with auto scaling when using stateful maps (with Actors). Our pipeline looks as follows:
read_parquet → Preprocess (load image etc., on CPU) → annotate (on GPU) → preprocess → annotate → … → write_parquet
Say we have 3 models in this pipeline and we want to use autoscaling, so we set concurrency to (1,16) for every map operator. Ray will then in the beginning assign many GPUs to the first annotator and less for 2nd and 3rd. This will use the resources fully, so scaling algorithm will be satisfied. But the pipeline will be unbalanced: first annotator will output much more records than 2nd and 3rd can consume. Backpressure won’t be triggered, because the rows are very small and don’t take much space in the object store, and every worker has a long queue of blocks to process. But if the job crashes, all the progress of the first annotators will be lost, because Ray doesn’t have resuming feature.
Is there a way to configure Ray to scale down some workers in this situation, or is such a feature planned?
1. Severity of the issue:
Low: Annoying but doesn’t hinder my work.
2. Environment:
- Ray version: 2.48.0
- Python version: 3.10.11
- OS: Linux
- Cloud/Infrastructure: 4 nodes x 4xA100s
- Other libs/tools (if relevant): -
3. What happened vs. what you expected:
- Expected: Ray scales down actors if a pipeline is unbalanced
- Actual: Ray never scales down actors as long as they’re busy and there’s space in object store.