Embedding Grafana visualizations into Ray Dashboard

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.


I’ve set up a Ray head node in Docker but am running into some issues with embedding Grafana visualizations into the dashboard. I have separate servers for both Prometheus and Grafana. As far as I can tell these are working as expected, and I can see metrics about the head node in Grafana. The dashboard also seems to be working, except the metrics section says “Set up Prometheus and Grafana for better Ray Dashboard experience”

Some things I’ve done:

  • Set RAY_GRAFANA_HOST with protocol/no trailing slash
  • Set RAY_PROMETHEUS_HOST with protocol/no trailing slash
  • Set RAY_GRAFANA_IFRAME_HOST (Might have done this one incorrectly, I set it to my-grafana-host/d/rayDefaultDashboard/dashboard-name)
  • In Grafana, under [security] I’ve set allow_embedding/cookie_secure to true, and cookie_samesite to none
  • Enabled anonymous access (is there a way to pass credentials from Ray to Grafana?)

When I go to the /api/grafana_health endpoint, it returns a 200 response and says “Grafana running” with the correct grafanaHost. However, for the dashboardUids section, I don’t see my dashboard, maybe that’s my issue? Is there an environment variable or something I can set to change this? Or maybe I’m missing a step somewhere?

Also ran grep -r 'grafana' /tmp/ray/sesion_latest/logs/, but that didn’t provide anything useful either.

Thank you!

Maybe take a look at this PR which is trying to improve doc about this: polish observability (o11y) docs by scottsun94 · Pull Request #39069 · ray-project/ray · GitHub

This one doesn’t seem right. It should be the IP where your browser (and the underlying machine) can reach the grafana server.

This one doesn’t seem right. It should be the IP where your browser (and the underlying machine) can reach the grafana server.

So that would be IP of RAY_GRAFANA_HOST right? It didn’t seem to like that either :slightly_frowning_face:

Yeah. As long as your browser can reach it. Can you try access IP of RAY_GRAFANA_HOST directly in your browser?

Yup that seems to work. Going onto the Docker container Ray is running on and curling the Grafana server works as well.

So both the grafana and prometheus healthchecks passed for you?

Oh that might be it! It was failing cause I had a self-signed certificate, but I fixed that, and now it’s failing because I have authentication turned on for Prometheus. Is there a way to pass credentials or some kind of token to Ray?

Hmm I’m not sure. Can you try turning off the auth for Prometheus first and see if it works?

Another thing to note, if you use an existing grafana, you may need to import Ray-provided dashboard JSON into it first: find them after you start Ray cluster at /tmp/ray/session_latest/metrics/grafana/dashboards and copy the JSONs over and import the Grafana dashboards

Turning off authentication for Prometheus did the trick. It would be nice to have authentication turned on but at least the visualizations are being embedded, thanks!

Great to know that.

Any idea how we can support it? How can Ray Head Node automatically authenticate to Prometheus in this case? With some credential files, env var or?

cc: @aguo