i want to get some metric in ServeController about deployment and replica
You can access deployment and replica metrics in Ray Serve using built-in metrics such as ray_serve_deployment_request_counter_total (number of queries processed per replica), ray_serve_deployment_processing_latency_ms (latency per replica), and ray_serve_replica_processing_queries (current queries being processed per replica). These metrics are available in Prometheus format on each node and can be viewed via the Ray Dashboard or scraped programmatically from the metrics endpoint (default port 8080) after starting Ray with --metrics-export-port=8080 according to the Serve monitoring documentation.
To get deployment and replica status programmatically in Python, use the serve.status() API, which returns a dataclass with details about deployments, replicas, and their statuses. For more granular or custom metrics, you can define and emit your own metrics from within deployment code, tagging them with deployment and replica metadata for later aggregation and analysis. See Serve monitoring documentation for examples.
Would you like more detail or code examples?
Sources:
Hint: Mention @RunLLM in the post for followups.