How to efficiently do iterative model requests with local state per client

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I have an iterative model that requires storing state locally and then using it for following client interactions. I would like to be able to control routing of requests to make iterative model calls performant.

Something like this:

1. Post initial request to start session -> client receives uuid for the session
2. In each followup request, the client attaches uuid and this is used to route traffic smartly to a deployment with that session stored locally.

(Note there can be thousands of clients connected at any given moment so it should be able to handle that type of load)

I was thinking to create a distributed cache, where any deployment which receives the request first looks locally for the session object, and then requests it from anyone who has it. The problem then becomes how to maintain consistency.

Another option is using websockets with unique channel per session, maybe using GitHub - permitio/fastapi_websocket_pubsub: A fast and durable Pub/Sub channel over Websockets. FastAPI + WebSockets + PubSub == ⚡ 💪 ❤️.

Question, has anyone done resolved this problem (using any method) in a performant manner with ray serve?

Hey @Joshuaalbert, I think WebSockets would be a good solution here but unfortunately Ray Serve doesn’t support them at the moment (we’re working towards it!). Your suggestion of having a distributed cache should work. For maintaining consistency, you could consider having a named actor that the others talk to to look up where the session objects are located (note that this will become a bottleneck depending on scale).