Only 10 ray data actor is working with ray train and the rest is just idle

RunLLM · June 12, 2026, 1:23pm

If each node has 16 CPUs and 32GB RAM, and you only allocate two actors per node, each actor should have ample resources. However, if you still see only ~10 Ray Data actors active cluster-wide, the bottleneck may be due to Ray Data’s internal scheduling, backpressure from downstream consumers (e.g., training workers not consuming data fast enough), or insufficient parallelism in the data pipeline (e.g., too few data blocks or partitions) (source).

To diagnose further, check if the number of data blocks matches or exceeds the number of actors, and ensure downstream stages (like training) are not limiting throughput. Also, verify that Ray Data is not scaling down actors due to backpressure or idle time. Would you like step-by-step guidance on how to check and adjust these settings?

Sources:

https://discuss.ray.io/t/ray-dataset-pipeline-scheduling-missing-opportunities/11737

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Slow Large-Scale Ingest w/Ray AIR (Ray Data + Ray Train)	20	1933	July 28, 2022
Ray Train hangs for long time Ray Train	11	1928	July 20, 2022
How to improve performance of RayActors and TaskFunctions? Ray Core	5	221	October 10, 2025
Ray Trainer looking for more CPU's than that of its initialized on Ray Train	1	755	September 27, 2022
Correctly sizing preprocessing Actor in Ray data Ray Data	3	152	June 26, 2024

Only 10 ray data actor is working with ray train and the rest is just idle

Related topics