Hi @Akshay_Malik ,
The upstram is fixed but ray serve is having a huge drop with increase in concurrency beyond 250, currently having 10 worker GPU nodes, with 6 model replica each, so totally 60 concurrent model serving. but latency is spiking and throughput is stuck, not improving with more replicas
Another ref: Non-linear throughput when scaling Ray Serve replicas - #2 by christina