I deployed a ray cluster on Kubernetes using kuberay and I want to monitor the cluster using prometheus metrics. After reading ray document, I know that there is service discovery file is generated on the head node /tmp/ray/prom_metrics_service_discovery.json. Using the below Prometheus config, Prometheus will automatically update the addresses that it scrapes based on the contents of Ray’s service discovery file.
# Prometheus config file
# my global config
global:
scrape_interval: 2s
evaluation_interval: 2s
# Scrape from Ray.
scrape_configs:
- job_name: 'ray'
file_sd_configs:
- files:
- '/tmp/ray/prom_metrics_service_discovery.json'
But since I am using Kubernetes, based on humble experience, I think the most convenient way to configure Prometheus to scape the ray metrics should be exposing metrics configuration on service annotations like this:
I am not very famiilar with how Prometheus is configured in K8s. Do you mind creating an issue and write a simple proposal? I am also curious if you are willing to contribute it!
Is it possible to scrape metrics from more than one ray cluster simultaneously through the addition of this Kubernetes service (from kk17) ? @Dmitri@kk17