Exposing KubeRay prometheus metrics configuration on head service annotations

I deployed a ray cluster on Kubernetes using kuberay and I want to monitor the cluster using prometheus metrics. After reading ray document, I know that there is service discovery file is generated on the head node /tmp/ray/prom_metrics_service_discovery.json. Using the below Prometheus config, Prometheus will automatically update the addresses that it scrapes based on the contents of Ray’s service discovery file.

# Prometheus config file

# my global config
global:
  scrape_interval:     2s
  evaluation_interval: 2s

# Scrape from Ray.
scrape_configs:
- job_name: 'ray'
  file_sd_configs:
  - files:
    - '/tmp/ray/prom_metrics_service_discovery.json'

But since I am using Kubernetes, based on humble experience, I think the most convenient way to configure Prometheus to scape the ray metrics should be exposing metrics configuration on service annotations like this:

apiVersion: v1
kind: Service
metadata:
  name: xxx
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/path: /metrics
    prometheus.io/port: "8081"

Is there any way to achieve this?

cc: @Dmitri @sangcho for thoughts

I am not very famiilar with how Prometheus is configured in K8s. Do you mind creating an issue and write a simple proposal? I am also curious if you are willing to contribute it!

There’s currently limited ability to configure the service created by the KubeRay operator.
The limitation is tracked here: [Feature] Customise Ray head service in RayCluster CR · Issue #625 · ray-project/kuberay · GitHub
I will link this discussion in the issue as well.

I manually add a Kubernetes service to solve this problem.

apiVersion: v1
kind: Service
metadata:
  # labels:
  #   app.kubernetes.io/name: kuberay-metrics
  #   ray.io/cluster: {{ include "ray-cluster.fullname" . }}
  name: {{ include "ray-cluster.fullname" . }}-metrics-svc
  annotations:
    {{- if .Values.prometheus.enable }}
    prometheus.io/scrape: "{{.Values.prometheus.enable }}"
    prometheus.io/path: /metrics
    prometheus.io/port: "8080"
    {{- end }}
spec:
  ports:
  - name: metrics
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app.kubernetes.io/name: kuberay
    ray.io/cluster: {{ include "ray-cluster.fullname" . }}
  type: ClusterIP

That work-around sounds reasonable for now.
I’ve opened an issue to track exposing service annotations.

Is it possible to scrape metrics from more than one ray cluster simultaneously through the addition of this Kubernetes service (from kk17) ? @Dmitri @kk17

cc: @Kai-Hsun_Chen for thoughts too