Scaling up handeled requests when using the batching wrapper

Antide · September 6, 2024, 4:54pm

Thanks for your reply, I found the issue later in the day. I tracked the post request with wireshark and I noticed I was getting exactly the same number as on my ray serve instance so that wasn’t ray serve that was bottlenecking (I kept the proxy disabled but it’s maybe also working with it). The problem was coming from python’s multithreading pool ( I needed to set the max number of threads artificially really high), I hope this will help someone else in the future, sorry for bothering

Topic		Replies	Views
Ray Serve is executing the requests sequentially instead parallel even after configuring auto-scale Ray Serve	11	826	October 20, 2023
Understanding performance of Ray serve Ray Serve	4	642	January 10, 2024
Batching when using non python client Ray Serve	1	414	March 24, 2021
Batching doesn't work: requests are processed one by one Ray Serve	2	594	June 19, 2021
Hanging issue with serve.batch	2	359	December 22, 2023

Scaling up handeled requests when using the batching wrapper

Related topics