How to efficiently do iterative model requests with local state per client

Joshuaalbert · June 17, 2022, 4:55am

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I have an iterative model that requires storing state locally and then using it for following client interactions. I would like to be able to control routing of requests to make iterative model calls performant.

Something like this:

1. Post initial request to start session -> client receives uuid for the session
2. In each followup request, the client attaches uuid and this is used to route traffic smartly to a deployment with that session stored locally.

(Note there can be thousands of clients connected at any given moment so it should be able to handle that type of load)

I was thinking to create a distributed cache, where any deployment which receives the request first looks locally for the session object, and then requests it from anyone who has it. The problem then becomes how to maintain consistency.

Another option is using websockets with unique channel per session, maybe using GitHub - permitio/fastapi_websocket_pubsub: A fast and durable Pub/Sub channel over Websockets. FastAPI + WebSockets + PubSub == ⚡ 💪 ❤️.

Question, has anyone done resolved this problem (using any method) in a performant manner with ray serve?

eoakes · June 24, 2022, 2:01pm

Hey @Joshuaalbert, I think WebSockets would be a good solution here but unfortunately Ray Serve doesn’t support them at the moment (we’re working towards it!). Your suggestion of having a distributed cache should work. For maintaining consistency, you could consider having a named actor that the others talk to to look up where the session objects are located (note that this will become a bottleneck depending on scale).

Topic		Replies	Views
Concurrently Processing Requests w/ Ray Serve Ray Serve	1	1014	April 6, 2023
Does Ray Serve support local model hot update/reload? Ray Serve	2	1187	July 5, 2022
Using Ray to build web apps	7	2040	January 30, 2021
Ray Serve: custom resource optimization Ray Serve	3	471	January 26, 2023
Why there is no possibility to call more than 100 requests in parallel to Ray Serve? Ray Serve	4	256	January 10, 2024

How to efficiently do iterative model requests with local state per client

Related topics