How to find no of requests/messages per replcia

mirage · July 3, 2025, 11:59am

on EKS i have 7 replcias. I have websocket where all clients send messages.
Now i want to know how can i find how many messages are recoived by each replica and whats the repsonse time.

In http request it was easy to see even at AWS Loadbalancer level but as its websocker , those requests are hidden inside it.

I want to know if tehre is any metric exposed by Ray which can show me that ?

Your QPS and Latency graph does not work with websocket messages

christina · July 3, 2025, 8:36pm

Hi! Yes, Ray Serve provides several metrics that can help you track the performance and activity of your replicas. I’ll mention a few here that’ll be relevant to the things you mentioned:

ray_serve_deployment_request_counter_total - tracks the number of queries processed by each replica

ray_serve_deployment_processing_latency_ms - latency for queries processed by each replica

You can set up Prometheus or Grafana too. Refer to the Ray Serve monitoring documentation and the Ray Dashboard, which provides visualizations of these metrics.

Read more here: Monitor Your Application — Ray 2.47.1

Topic		Replies	Views
Counting per replica queries / Reacting to tasks being assigned to replicas in router Ray Serve	0	437	May 12, 2021
How to check the lengh of queue for each replica of deployment Ray Serve	7	865	February 19, 2025
Ray serve latency overhead Ray Serve	3	694	April 20, 2021
How to ensure ray serve using max replicas possible Ray Serve	3	592	October 19, 2023
Why there is no possibility to call more than 100 requests in parallel to Ray Serve? Ray Serve	4	256	January 10, 2024

How to find no of requests/messages per replcia

Related topics