Is Ray Core suitable for long-running, I/O-heavy streaming with 200+ actors per cluster?

RunLLM · June 22, 2026, 6:32pm

It seems likely that Ray Core can be used for long-lived, I/O-heavy streaming pipelines with actors, but there are important caveats and best practices to consider:

For I/O-bound workloads, async actors are generally recommended over tasks, as they avoid process-level overhead and better utilize resources for network I/O. Ray actors can be used for long-running pollers and message processors, but you should use async actors to maximize throughput and efficiency for I/O-heavy workloads, as discussed in this thread and here.
Regarding GCS and long-lived actors: most reported GCS degradation issues are related to clusters with massive numbers of short-lived tasks/actors, leading to unbounded metadata growth. For clusters with a stable set of long-lived actors, there is less risk, but you should still monitor GCS memory and dashboard responsiveness, as some users have reported slowdowns in long-running clusters (example, example). There is no explicit evidence that long-lived actors alone cause GCS issues, but best practice is to monitor for memory growth and consider periodic restarts if you observe degradation.

Would you like more detail on any of these points, or on mailbox throughput, multi-cluster patterns, or maintenance recommendations?

Sources:

Hint: Mention @Herald in the post for followups.

Topic		Replies	Views
What is the best approach for long running IO tasks (pollers)? Ray Core	1	613	August 9, 2021
Best practices for long-running Ray clusters with extremely high task throughput - GCS metadata accumulation causing scheduling delays Ray Core	1	247	November 28, 2025
Ray actor only uses one core on a cluster managed using SLURM Ray Clusters	1	467	September 16, 2021
Actors pool - process stuck / tasks lost on a long run Ray Core	4	684	February 24, 2022
How does Ray actor work? Kubernetes	2	173	July 16, 2026

Is Ray Core suitable for long-running, I/O-heavy streaming with 200+ actors per cluster?

Related topics