Ray Monitor Not Connecting to Grafana and Prometheus

I am running Ray 2.3.1 on my Mac Pro. I also have Grafana and Prometheus running on this machine. I have verified that both are working by checking localhost:3000 and localhost:9090, respectively. I launch a local Ray cluster like so

export RAY_GRAFANA_HOST=http://127.0.0.1:3000
export RAY_PROMETHEUS_HOST=http://127.0.0.1:9090
ray start --head

Ray starts. The Ray monitor at 127.0.0.1:8265 shows broken cluster monitoring windows. The screen looks like this

If I hover over one of the windows I see the message “127.0.0.1 refused to connect”.

The Ray cluster itself works correctly, as does the Recent jobs tab of the monitor.

I have tried adding export RAY_GRAFANA_IFRAME_HOST=http://127.0.0.1:3000, as well as not setting any of these environment variables, and see the same result.

I watched the web traffic with Chrome developer tools while refreshing the Ray monitor web page. The following things looked wrong:

  • Two calls to roboto-latin.500 on the Ray monitor port failed with the message “Failed to load response data. No data found for resource for given identifier” in the Response tab.
  • Two calls to default-dashboard?... on the Grafana port showed the message “Failed to load response data: No content available because this request was redirected” in the Response tab.
  • Two calls to login on the Grafana port showed the message “Failed to load response data. No resource with the given identifier found” in the Response tab.

How do I get Grafana and Prometheus to integrate with Ray?

1 Like

@rickyyx @sangcho Do we have specific instructions how to install Grafana and Prometheus on local host and how Ray dashboard can discover its configs?

Does @wpm have to use the dashboard command: ray dashboard [-p <port, 8265 by default>] <cluster config file>

To the best of my knowledge I followed the documentation instructions you linked to correctly.

I’ll try running ray dashboard <cluster config file>, but I don’t know where my cluster config file is. I’m just having Ray create a local cluster by default.

ray dashboard is not needed for a local ray cluster. Hmm. I just tried to set those up on my macbook pro, it worked fine.

  • Grafana 9.4.7
  • Prometheus 2.43.0
  • Grafana started with brew services start grafana
  • Prometheus started with docker run -p 9090:9090 prom/prometheus
  • I don’t think I have any dashboards in Grafana. I poked around the UI and didn’t see anything.

I have a dashboards page that looks like this

According to: Metrics — Ray 2.3.1


You need to start prometheus and grafana with the config files provided by Ray so that:

  • prometheus can scrap the metrics from the ray cluster properly
  • grafana can talk to the prometheus and visualize the metrics with the template dashboard provided by Ray

Can you give it a try?

That worked. The Cluster Utilization and Node Count windows now display data.

For reference of anybody else who hits this, here is exactly how I made this work on my Mac.

  1. brew install grafana

  2. brew install prometheus

  3. Change the --config-file line in /usr/local/etc/prometheus.args to read --config.file /tmp/ray/session_latest/metrics/prometheus/prometheus.yml.

  4. Uncomment the appropriate lines in /usr/local/etc/grafana/grafana.ini so that it matches the contents of /tmp/ray/session_latest/metrics/grafana/grafana.ini.

  5. brew services start grafana

  6. brew services start prometheus

  7. ray start --head

Thanks for your help.

1 Like

Glad that it works out!

@aguo We probably should add some guides for homebrew-based workflows ^. Added it to our backlog.

Command line mode could work like this.

./prometheus --config.file=/tmp/ray/session_latest/metrics/prometheus/prometheus.yml

grafana-server --config /tmp/ray/session_latest/metrics/grafana/grafana.ini web

My first question is on node machine, embedding metic web cannot display charts, but head can. How to set make node web display right?

My second question is when use docker-compose pull up container of grafana and promethus. Ray dashboard embedding metic web part cannot found any chart. List docker-compose.yml file what used. Seems like network didn’t set right.

version: '3'

networks:
    ray_dashboard:
        driver: bridge

services:
    prometheus:
        image: prom/prometheus
        container_name: prometheus
        hostname: prometheus
        restart: always
        volumes:
            - /tmp/ray/session_latest/metrics/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
        ports:
            - "9090:9090"
        networks:
            - ray_dashboard
        # network_mode: "host"

    grafana:
        image: grafana/grafana
        container_name: grafana
        hostname: grafana
        restart: always
        # environment:
          # GF_PATHS_CONFIG: /tmp/ray/session_latest/metrics/grafana/grafana.ini
        volumes:
            - /tmp/ray/session_latest/metrics/grafana/grafana.ini:/etc/grafana/grafana.ini
        ports:
            - "3000:3000"
        networks:
            - ray_dashboard

Official user guide didn’t give information about this. Docker compose is more convient than download grafana and promethus respectly.

My first question is on node machine, embedding metic web cannot display charts, but head can. How to set make node web display right?

I’m not sure if I understand your questions. Can you elaborate? All the env variables need to be set up on head node and the dashboard process is run on head node.

My second question is when use docker-compose pull up container of grafana and promethus.

Check out the setup guide and the requirements here Configuring and Managing Ray Dashboard — Ray 2.5.1. We cannot cover different ways to install/run grafana/prometheus but as long as the setup meets the requirements listed in the documentation, it should work. Let us know if you still run into issues.

On head node, child node could access dashboad:

ray start --head --dashboard-host="0.0.0.0"

Grafana and premethus run on head node:

grafana-server --config /tmp/ray/session_latest/metrics/grafana/grafana.ini web

./prometheus --config.file=/tmp/ray/session_latest/metrics/prometheus/prometheus.yml

On head node, all work right. On child node, grafana couldn’t display.

For latest grafana OSS version 10.0.0:

sudo grafana server --config /tmp/ray/session_latest/metrics/grafana/grafana.ini web

On child node, grafana couldn’t display.

You mean worker node? Whatl do you mean by grafana couldn’t display?

I have met that problem on ray 2.4.0. In LAN network, except head node, every machine which could access head node could use web brower access dashboard, but grafana embedding part cannot display. I will down grade later to reproduce the problem.

I don‘'t know what ray 2.5.0 changed on dashboard. A new error. Please look at my screenshot.

Let’s go back dashboard issue.

On child node(192.168.0.104):

On head node(192.168.0.102):

Here mention change environment variable RAY_GRAFANA_HOST and RAY_PROMETHEUS_HOST.
https://docs.ray.io/en/latest/cluster/configure-manage-dashboard.html#embed-grafana-in-dashboard

But in this document, cli mode start cluster didn’t gave us an example about how to set environment variable in runtime:
https://docs.ray.io/en/latest/ray-core/starting-ray.html#start-ray-cli

@funk_Jz the error in 2.5 indicates that prometheus.yml is directory. Do you know why? It’s not related to ray but more to your system.

  1. Please use the head node to access Ray Dashboard. I don’t think it works on worker node… cc: @sangcho
  2. If you don’t use VMs (Ray on Cloud VMs — Ray 2.5.1) or K8s (Ray on Kubernetes — Ray 2.5.1) to start ray cluster, try setting the env variable on the head node before starting the cluster manually.

Haha, I think it is a metaphysics issue(prometheus.yml change to a directory). So clear up the /tmp/ray, and restart cluster. Everything goes well.

  1. Please use the head node to access Ray Dashboard. I don’t think it works on worker node… cc: @sangcho

For the issue itself, we embed the Grafana page to the dashboard. So your child node probably is not able to access the embedded Grafana. As @Huaiwei_Sun said, this is not very well supported use case (but if you’d like to fix it, you should make sure the child node can access the Grafana UI).