For high-volume, low-latency ingestion of small sensor events in Ray, the recommended pattern is to use async actors or actor pools for I/O-bound workloads, as they efficiently handle concurrent event streams and scale with the number of CPU cores. Ray Serve can also be used for HTTP-based ingestion, providing autoscaling and request routing, but for pure event streaming (especially if not HTTP), async actors are often more efficient. Ray’s built-in queue (ray.util.queue.Queue) is available but is generally slower and less scalable than actor-based approaches for this use case. For batch or offline processing, Ray Data is preferred, but for real-time streaming, async actors or Serve deployments with async endpoints are best suited.
Lessons learned:
- For I/O-bound sensor streams, start with a single async actor and load test; scale to multiple actors if the event loop saturates.
- Use actor pools to distribute events across N actors for load balancing.
- Ray Serve is ideal if you want HTTP endpoints with autoscaling and built-in request routing.
- Avoid using Ray’s distributed Queue for high-throughput ingestion, as it introduces more overhead than actor-based solutions.
- For MQTT/WebSocket ingestion, use an async actor to subscribe and process messages, then fan out to worker actors if needed.
Would you like a code example or more detail on a specific ingestion pattern?
Sources:
- Multiple async actors vs single actor vs asyncio
- Long-running I/O tasks: async actors vs tasks
- Actor pool load balancing example
- Ray Serve for online/streaming ingestion
- Ray Queue performance trade-offs
Hint: Mention @RunLLM in the post for followups.