Only 10 ray data actor is working with ray train and the rest is just idle

RunLLM · June 12, 2026, 4:58pm

Yes, if your GPU object store memory (or overall node memory) is nearly full (e.g., 87.7% as shown), this can cause Ray tasks to queue, slow down, or even get stuck due to memory pressure and backpressure in the pipeline. When object store memory is full, Ray cannot create or fetch new objects efficiently, leading to high queueing times and idle actors (source, source).

To resolve this, reduce memory usage by lowering parallelism, increasing batch granularity, or scaling up your cluster with more memory. Also, monitor object store usage in the Ray Dashboard and consider enabling object spilling to disk if not already configured. Would you like more detail on configuring object spilling or memory management?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Slow Large-Scale Ingest w/Ray AIR (Ray Data + Ray Train)	20	1968	July 28, 2022
Ray Train hangs for long time Ray Train	11	1947	July 20, 2022
How to improve performance of RayActors and TaskFunctions? Ray Core	5	260	October 10, 2025
Ray Trainer looking for more CPU's than that of its initialized on Ray Train	1	762	September 27, 2022
Correctly sizing preprocessing Actor in Ray data Ray Data	3	159	June 26, 2024

Only 10 ray data actor is working with ray train and the rest is just idle

Related topics