Exposing KubeRay prometheus metrics configuration on head service annotations

I deployed a ray cluster on Kubernetes using kuberay and I want to monitor the cluster using prometheus metrics. After reading ray document, I know that there is service discovery file is generated on the head node /tmp/ray/prom_metrics_service_discovery.json. Using the below Prometheus config, Prometheus will automatically update the addresses that it scrapes based on the contents of Ray’s service discovery file.

# Prometheus config file

# my global config
  scrape_interval:     2s
  evaluation_interval: 2s

# Scrape from Ray.
- job_name: 'ray'
  - files:
    - '/tmp/ray/prom_metrics_service_discovery.json'

But since I am using Kubernetes, based on humble experience, I think the most convenient way to configure Prometheus to scape the ray metrics should be exposing metrics configuration on service annotations like this:

apiVersion: v1
kind: Service
  name: xxx
    prometheus.io/scrape: "true"
    prometheus.io/path: /metrics
    prometheus.io/port: "8081"

Is there any way to achieve this?

cc: @Dmitri @sangcho for thoughts

I am not very famiilar with how Prometheus is configured in K8s. Do you mind creating an issue and write a simple proposal? I am also curious if you are willing to contribute it!

There’s currently limited ability to configure the service created by the KubeRay operator.
The limitation is tracked here: [Feature] Customise Ray head service in RayCluster CR · Issue #625 · ray-project/kuberay · GitHub
I will link this discussion in the issue as well.

I manually add a Kubernetes service to solve this problem.

apiVersion: v1
kind: Service
  # labels:
  #   app.kubernetes.io/name: kuberay-metrics
  #   ray.io/cluster: {{ include "ray-cluster.fullname" . }}
  name: {{ include "ray-cluster.fullname" . }}-metrics-svc
    {{- if .Values.prometheus.enable }}
    prometheus.io/scrape: "{{.Values.prometheus.enable }}"
    prometheus.io/path: /metrics
    prometheus.io/port: "8080"
    {{- end }}
  - name: metrics
    port: 8080
    protocol: TCP
    targetPort: 8080
    app.kubernetes.io/name: kuberay
    ray.io/cluster: {{ include "ray-cluster.fullname" . }}
  type: ClusterIP

That work-around sounds reasonable for now.
I’ve opened an issue to track exposing service annotations.