on EKS i have 7 replcias. I have websocket where all clients send messages.
Now i want to know how can i find how many messages are recoived by each replica and whats the repsonse time.
In http request it was easy to see even at AWS Loadbalancer level but as its websocker , those requests are hidden inside it.
I want to know if tehre is any metric exposed by Ray which can show me that ?
Your QPS and Latency graph does not work with websocket messages
Hi! Yes, Ray Serve provides several metrics that can help you track the performance and activity of your replicas. I’ll mention a few here that’ll be relevant to the things you mentioned:
ray_serve_deployment_request_counter_total - tracks the number of queries processed by each replica
ray_serve_deployment_processing_latency_ms - latency for queries processed by each replica
You can set up Prometheus or Grafana too. Refer to the Ray Serve monitoring documentation and the Ray Dashboard, which provides visualizations of these metrics.