How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I have an iterative model that requires storing state locally and then using it for following client interactions. I would like to be able to control routing of requests to make iterative model calls performant.
Something like this:
1. Post initial request to start session -> client receives uuid for the session
2. In each followup request, the client attaches uuid and this is used to route traffic smartly to a deployment with that session stored locally.
(Note there can be thousands of clients connected at any given moment so it should be able to handle that type of load)
I was thinking to create a distributed cache, where any deployment which receives the request first looks locally for the session object, and then requests it from anyone who has it. The problem then becomes how to maintain consistency.
Another option is using websockets with unique channel per session, maybe using GitHub - permitio/fastapi_websocket_pubsub: A fast and durable Pub/Sub channel over Websockets. FastAPI + WebSockets + PubSub == ⚡ 💪 ❤️.
Question, has anyone done resolved this problem (using any method) in a performant manner with ray serve?