Scaling up handeled requests when using the batching wrapper

Antide · September 6, 2024, 10:37am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hello everyone, my issue is the following. I’m trying to autobatch request to send them to OpenAI API using the batch decorator.

It’s initially working fine but after 11 requests, the batch is sent to the API (even if I should be able to accumulate 1000s of requests into the batch).
From what I understand this was likely a bottleneck from the proxy. One solution proposed in this discussion was to have more replicas and nodes but this is not possible for my task because I have to reduce the number of batch I send as much as possible (due to the limitations of the OpenAI API) .

I tried multiple things, including disabling the proxy with serve.start(http_options={"proxy_location": ProxyLocation.Disabled}) but that was unsucessful. I even attempted to write my own batcher thinking that was the bottleneck but that seems to be indeed a problem with the input request, after 11 requests, my server kind of pause to digest the requests and once the answer has been sent back it continuing by doing the same thing (11 request, process, send back, 11 request , process, send back … etc).

I’m open to any suggestion to solve the issue

shrekris · September 6, 2024, 3:45pm

Could you share your code that uses @serve.batch? How large are the requests you’re sending?

Antide · September 6, 2024, 4:54pm

Thanks for your reply, I found the issue later in the day. I tracked the post request with wireshark and I noticed I was getting exactly the same number as on my ray serve instance so that wasn’t ray serve that was bottlenecking (I kept the proxy disabled but it’s maybe also working with it). The problem was coming from python’s multithreading pool ( I needed to set the max number of threads artificially really high), I hope this will help someone else in the future, sorry for bothering

Topic		Replies	Views
Batching doesn't work: requests are processed one by one Ray Serve	2	604	June 19, 2021
Batching when using non python client Ray Serve	1	414	March 24, 2021
Hanging issue with serve.batch	2	365	December 22, 2023
Ray serve blocking requests when serving an LLM Ray Serve	3	174	October 20, 2024
Understanding performance of Ray serve Ray Serve	4	667	January 10, 2024

Scaling up handeled requests when using the batching wrapper

Related topics