Caching and batching with serve

skotchZ · April 15, 2021, 11:45pm

I have some question about making rest-api service with batching and caching of requests results. As far as i know the one of best way for caching with distributed systems is with help of redis, but when i use it while processing batch i lose potential throughput of service. Example: queue of requests [q1, q2, q3, … q15]. my max batch is 4, ray takes q1, q2, q3, q4 but q1 and q2 are already cached so it would be great to check these requests before putting them in a queue. How to make this? Is Composing Multiple Models will be a proper way?

architkulkarni · April 17, 2021, 12:19am

Yes, I think that’s a good idea! You can create a simple backend A which just checks the cache, and if not cached, forwards the request to your main backend B that accepts batches. You can call backend B from backend A by using ServeHandles like in the Composing Multiple Models docs you mentioned.

Topic		Replies	Views
Batching when using non python client Ray Serve	1	415	March 24, 2021
Batching doesn't work: requests are processed one by one Ray Serve	2	609	June 19, 2021
Keypoint streaming usecase Ray Serve	7	594	May 26, 2022
Concurrently Processing Requests w/ Ray Serve Ray Serve	1	1049	April 6, 2023
How to View Results of Post Request with Ray Serve Batching? Ray Serve	1	394	February 7, 2022

Caching and batching with serve

Related topics