Grafana Dashboard shows No Data for GPU metrics

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

  • Ray version: 2.47.1
  • Python version: 3.12
  • OS: Unknown
  • Cloud/Infrastructure: EC2
  • Other libs/tools (if relevant):

3. What happened vs. what you expected:

  • Expected: Grafana dashboard to show the GPU usage as in the Ray Dashboard
  • Actual: Ray Dashboard shows GPU usage, Grafana doesn’t

Hello. I’m running a Ray workload with a multi-node setup. The Ray Dashboard shows correctly the GPU usage metrics

But in Grafana, I see no data for the GPU metrics. I can only see CPU statistics.

This is my prometheus.yml file:

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  - job_name: 'ray'
    static_configs:
      - targets: ['<HEAD_NODE_IP>:8080']
    metrics_path: '/metrics'
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
        labels:
          app: "prometheus"

Can you please help? Thank you!