Counting per replica queries / Reacting to tasks being assigned to replicas in router

AstraliteHeart · May 12, 2021, 12:14am

Hello Ray team,

I am looking for a way to get number of queries assigned per replica.

I work on a set of inference models being called through http routes and .remote from Locust and ad-hoc scripts to simulate effects of users activity on different configurations (number of replicas, queue depth, etc.).

I am trying to supplement my load graphs with per-replica utilization ones coming from Ray (they are nice to look at and helps to eyeball issues)

I was not able to find a suitable API and so far the solution that worked for me was to inject code that calls my ‘request counter’ agent from serve.router.ReplicaSet._try_assign_replica to increment per replica counters and in my actual handler to decrement this counter after the task is done.

Changing the library code for this seems like an iffy solution, and monkey patching is not an option as http handler uses a separate ASGI process to which (AFAIK) I have no access (well, in addition to every other reason why MP is a bad idea).

The docs mention serve_replica_queued_queries system metric (is it actually called backend_queued_queries_total?) but I am on the Windows, so no dashboard or metrics export for me currently I tried to access collected metric directly via API but also ran into issues.

Any recommendations?

Topic		Replies	Views
Ray serve autoscaling queue size Ray Serve	5	1320	May 24, 2022
How to check the lengh of queue for each replica of deployment Ray Serve	7	865	February 19, 2025
Ray Serve not distributing load to all replicas equally Ray Serve	3	49	June 20, 2025
How to ensure ray serve using max replicas possible Ray Serve	3	592	October 19, 2023
Ray Serve is executing the requests sequentially instead parallel even after configuring auto-scale Ray Serve	11	830	October 20, 2023

Counting per replica queries / Reacting to tasks being assigned to replicas in router

Related topics