It seems likely that Ray Core can be used for long-lived, I/O-heavy streaming pipelines with actors, but there are important caveats and best practices to consider:
-
For I/O-bound workloads, async actors are generally recommended over tasks, as they avoid process-level overhead and better utilize resources for network I/O. Ray actors can be used for long-running pollers and message processors, but you should use async actors to maximize throughput and efficiency for I/O-heavy workloads, as discussed in this thread and here.
-
Regarding GCS and long-lived actors: most reported GCS degradation issues are related to clusters with massive numbers of short-lived tasks/actors, leading to unbounded metadata growth. For clusters with a stable set of long-lived actors, there is less risk, but you should still monitor GCS memory and dashboard responsiveness, as some users have reported slowdowns in long-running clusters (example, example). There is no explicit evidence that long-lived actors alone cause GCS issues, but best practice is to monitor for memory growth and consider periodic restarts if you observe degradation.
Would you like more detail on any of these points, or on mailbox throughput, multi-cluster patterns, or maintenance recommendations?
Sources:
Hint: Mention @Herald in the post for followups.