1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
2. Environment:
- Ray version: 2.47.1
- Python version: v3.12.11
- OS: Linux
- Cloud/Infrastructure: 8x A100 40GB Node
- Other libs/tools (if relevant):
3. What happened vs. what you expected:
- Expected: Reduce number of assigned tasks to actors (4 → 2)
- Actual: Always 4 tasks per actors are assigned
I ran map_batches
(DoclingConverter) with 126 actors.
Running Dataset: dataset_3_0. Active & requested resources: 0/96 CPU, 7/7 GPU, 1.9MB/93.1GB object store: 3%|███▋ | 111/3.83k [01:59<20:34, 3.01 row/s]
- StreamingRepartition: Tasks: 0; Actors: 0; Queued blocks: 0; Resources: 0.0 CPU, 1.9MB object store: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.83k/3.83k [01:57<00:00, 372 row/s]
- MapBatches(DoclingConverter): Tasks: 504; Actors: 126; Queued blocks: 3211; Resources: 0.0 CPU, 7.0 GPU, 66.1KB object store; [0/615 objects local]: 3%|██▍ | 112/3.83k [01:57<19:25, 3.19 row/s]
The actor only handles 1 task at a time, but 4 tasks are always assigned to an actor, making remaining 3 tasks “Waiting for scheduling”
Is there a way adjust such factor? I would like to adjust such factor 4 → 2 to stay tasks (blocks) queued.
I wanted to this because my (bad) implementation of the task takes from 2 min to 1 hour, depending on the row data. So I wanted to distribute tasks to the actors as evenly as possible.
Thank you!