Scaling up handeled requests when using the batching wrapper

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hello everyone, my issue is the following. I’m trying to autobatch request to send them to OpenAI API using the batch decorator.

It’s initially working fine but after 11 requests, the batch is sent to the API (even if I should be able to accumulate 1000s of requests into the batch).
From what I understand this was likely a bottleneck from the proxy. One solution proposed in this discussion was to have more replicas and nodes but this is not possible for my task because I have to reduce the number of batch I send as much as possible (due to the limitations of the OpenAI API) .

I tried multiple things, including disabling the proxy with serve.start(http_options={"proxy_location": ProxyLocation.Disabled}) but that was unsucessful. I even attempted to write my own batcher thinking that was the bottleneck but that seems to be indeed a problem with the input request, after 11 requests, my server kind of pause to digest the requests and once the answer has been sent back it continuing by doing the same thing (11 request, process, send back, 11 request , process, send back … etc).

I’m open to any suggestion to solve the issue

Could you share your code that uses @serve.batch? How large are the requests you’re sending?

Thanks for your reply, I found the issue later in the day. I tracked the post request with wireshark and I noticed I was getting exactly the same number as on my ray serve instance so that wasn’t ray serve that was bottlenecking (I kept the proxy disabled but it’s maybe also working with it). The problem was coming from python’s multithreading pool ( I needed to set the max number of threads artificially really high), I hope this will help someone else in the future, sorry for bothering