Understanding performance of Ray serve

Can you please explain what you mean by node and how to increase it? I run this code on my local machine with 16 CPUs

I mean the number of machines. Each machine runs only one proxy, so that proxy is likely the bottleneck. If you try this workload with more machines– but the same number of replicas– and the max QPS goes up, then the proxy actor is likely the bottleneck.