How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task.
I’ve set up a Ray head node in Docker but am running into some issues with embedding Grafana visualizations into the dashboard. I have separate servers for both Prometheus and Grafana. As far as I can tell these are working as expected, and I can see metrics about the head node in Grafana. The dashboard also seems to be working, except the metrics section says “Set up Prometheus and Grafana for better Ray Dashboard experience”
Some things I’ve done:
Set RAY_GRAFANA_HOST with protocol/no trailing slash
Set RAY_PROMETHEUS_HOST with protocol/no trailing slash
Set RAY_GRAFANA_IFRAME_HOST (Might have done this one incorrectly, I set it to my-grafana-host/d/rayDefaultDashboard/dashboard-name)
In Grafana, under [security] I’ve set allow_embedding/cookie_secure to true, and cookie_samesite to none
Enabled anonymous access (is there a way to pass credentials from Ray to Grafana?)
When I go to the /api/grafana_health endpoint, it returns a 200 response and says “Grafana running” with the correct grafanaHost. However, for the dashboardUids section, I don’t see my dashboard, maybe that’s my issue? Is there an environment variable or something I can set to change this? Or maybe I’m missing a step somewhere?
Also ran grep -r 'grafana' /tmp/ray/sesion_latest/logs/, but that didn’t provide anything useful either.
Oh that might be it! It was failing cause I had a self-signed certificate, but I fixed that, and now it’s failing because I have authentication turned on for Prometheus. Is there a way to pass credentials or some kind of token to Ray?
Hmm I’m not sure. Can you try turning off the auth for Prometheus first and see if it works?
Another thing to note, if you use an existing grafana, you may need to import Ray-provided dashboard JSON into it first: find them after you start Ray cluster at /tmp/ray/session_latest/metrics/grafana/dashboards and copy the JSONs over and import the Grafana dashboards