Stream processing of events (feature pre-processing) with "at least once" guarantee & auto-scaling

Ben · July 21, 2023, 10:48am

Hello,
I’m looking into using Ray for a project and I would need some feedback or guidance on whether this is a good fit, and how to best implement it. The workflow is the following:

Clients upload videos on an S3 buckets.
Adding a video on bucket sends an event on an SQS queue announcing the new video file.
I need to process videos with an expensive GPU model, in a distributed streaming fashion. The longer the video, the longer this process will take. I cannot split a single video into fixed-sized chunks to parallellize processing a single video.
I need the processing to scale up or down according to queue length (to provide “low-latency” when many videos are in the queue, but to avoid paying for idle GPUs when the queue is empty). I would want the system to scale down to 0 GPUs if there is not video in the queue.
I need to make sure I process all videos at least once. Ideally exactly-once but at-least-once is fine as long as re-processing happens only under specific circumstances (e.g. redeploying).
I need to persist the output on another S3 bucket, or a feature store

I could do it with Spark streaming: it provides the at-least once guarantee, the ability to scale up and down BUT the mini-batch synchronisation is problematic as a long video would block the batch for a long time, and other videos would need to wait until this video is processed to be considered.

I could also theoretically do it in Flink, which provides at-least-once guarantees and does not suffer from Spark’s micro-batch limitation, BUT auto-scaling seems to be experimental and has strong limitations.

Because none of these options are great, I was thinking of doing this as a simple K8s deployment, where each pod reads from the queue and processes the events as they come in. I would scale the deployment up and down according to the message queue length.

Would there be a simple and clean solution to do it with Ray?

I was looking into ray workflows, where I could have a SQS listener create a new workflow for each incoming message. But I am not sure about the folloiwng:
a. Would Ray be able to scale up or down according to how many workflows are in the queue?
b. Would Ray workflow be able to handle millions of events?
c. Would Ray workflow be able to resume processing events after a redeployment?
I was also looking at implementing it with a streaming Ray dataset, but as far as I can tell:
a. this would not support scaling up and down based on load as parallellism would be fixed?
b. There is no support for checkpointing/at-least-once guarantees?
I was otherwise looking at implementing it with a Queue + Actors, but:
a. I don’t think there would be any out-the-box at-least-once support, and I would need to reimplement it
b. I am also not sure the actor pool could scale up and down automtically?
Finally I saw people do it with Ray Serve:
a. auto-scaling could work there, BUT it would not scale down to 0 when no video are in the queue (minimum is 1)?
b. I would still need some code to poll from SQS and emit calls to serve, and support at-least once myself on that side?

I would greatly appreciate your feedback & insight on this topic, there might be something obvious that I am missing

Thanks!

ivw · August 6, 2024, 5:27pm

It’s been a year; could you answer your own question with your learnings? It might help other people; I am especially interested in the comparison with Flink. Thanks

lmsh7_7 · April 21, 2025, 2:40pm

It’s been almost a year; could you answer your own question with your learnings?

Topic		Replies	Views
Recipe to process a bunch of files Ray Core	1	489	February 21, 2023
I have a question for ray.data in realtime streaming process scenario Ray Data	5	733	August 22, 2024
Ray inferencing not happening in streaming way	7	378	December 13, 2023
Is Ray suitable for edge real time processing? Ray Core	8	1081	December 11, 2023
Autoscaling RayServe Pods in k8s keeps terminating and restarting pods Ray Serve	4	708	November 20, 2023

Stream processing of events (feature pre-processing) with "at least once" guarantee & auto-scaling

Related topics