How to get AutoscalingStateManager from ServeController

alex_alex · January 4, 2026, 9:25am

i want to get some metric in ServeController about deployment and replica

RunLLM · January 4, 2026, 9:26am

You can access deployment and replica metrics in Ray Serve using built-in metrics such as ray_serve_deployment_request_counter_total (number of queries processed per replica), ray_serve_deployment_processing_latency_ms (latency per replica), and ray_serve_replica_processing_queries (current queries being processed per replica). These metrics are available in Prometheus format on each node and can be viewed via the Ray Dashboard or scraped programmatically from the metrics endpoint (default port 8080) after starting Ray with --metrics-export-port=8080 according to the Serve monitoring documentation.

To get deployment and replica status programmatically in Python, use the serve.status() API, which returns a dataclass with details about deployments, replicas, and their statuses. For more granular or custom metrics, you can define and emit your own metrics from within deployment code, tagging them with deployment and replica metadata for later aggregation and analysis. See Serve monitoring documentation for examples.

Would you like more detail or code examples?

Sources:

Serve monitoring documentation

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
How to get the replica number from Serve deployment Handle in realtime	0	249	August 28, 2023
Ray Serve replica level autoscaling not working with Kube deployment Ray Serve	3	68	June 11, 2025
How to check the lengh of queue for each replica of deployment Ray Serve	7	1022	February 19, 2025
Scale Multiple Ray Serve Deployments Proportionally Ray Serve	0	115	May 14, 2024
Autoscaling Replicas in Ray Serve Ray Serve	5	1783	March 12, 2021

How to get AutoscalingStateManager from ServeController

Related topics