Scaling up handeled requests when using the batching wrapper

Thanks for your reply, I found the issue later in the day. I tracked the post request with wireshark and I noticed I was getting exactly the same number as on my ray serve instance so that wasn’t ray serve that was bottlenecking (I kept the proxy disabled but it’s maybe also working with it). The problem was coming from python’s multithreading pool ( I needed to set the max number of threads artificially really high), I hope this will help someone else in the future, sorry for bothering