I have some question about making rest-api service with batching and caching of requests results. As far as i know the one of best way for caching with distributed systems is with help of redis, but when i use it while processing batch i lose potential throughput of service. Example: queue of requests [q1, q2, q3, … q15]. my max batch is 4, ray takes q1, q2, q3, q4 but q1 and q2 are already cached so it would be great to check these requests before putting them in a queue. How to make this? Is Composing Multiple Models will be a proper way?
Yes, I think that’s a good idea! You can create a simple backend A which just checks the cache, and if not cached, forwards the request to your main backend B that accepts batches. You can call backend B from backend A by using ServeHandles like in the Composing Multiple Models docs you mentioned.