Issue with Ray Dashboard Setup on Windows
Prometheus
Hiya guys, I am new to Ray and facing an issue with the Ray Dashboard. For context, running systeminfo
OS Name: Microsoft Windows 11 Pro
OS Version: 10.0.22621 N/A Build 22621
System Type: x64-based PC
Processor(s): Intel64 Family 6 Model 165 Stepping 5 GenuineIntel ~2904 Mhz
When running the Ray Dashboard on Windows, the Prometheus and Grafana setup doesnt work the way it is detailed in the docs. The docs say that to run the dashboard, you need to run the following command:
./prometheus --config.file=/tmp/ray/session_latest/metrics/prometheus/prometheus.yml
I understand that the symbolic link /tmp/ray/session_latest
points to the latest active Ray session. This doesn’t seem to work for me on win 11 (even after enabling symbolic links in my Git Bash Setup). This results in :
msg="Error loading config (--config.file=C:/Users/Shav/AppData/Local/Temp/ray/session_latest/metrics/prometheus/prometheus.yml)" file=C:\Users\Shav\AppData\Local\Temp\ray\session_latest\metrics\prometheus\prometheus.yml err="open C:/Users/Shav/AppData/Local/Temp/ray/session_latest/metrics/prometheus/prometheus.yml: The system cannot find the path specified."
To get around this symbolic link issue, if I do the following to get the latest session:
CWD=$(pwd)
cd /tmp/ray
# Get the latest folder
SESSIONLATEST=$(ls -td */ | head -1)
cd $CWD/prometheus
./prometheus --config.file=/tmp/ray/$SESSIONLATEST/metrics/prometheus/prometheus.yml
I face another issue. This setup, as well as if i create my own prometheus.yml
file with the following contents:
# Prometheus config file
# my global config
global:
scrape_interval: 2s
evaluation_interval: 2s
# Scrape from Ray.
scrape_configs:
- job_name: "ray"
file_sd_configs:
- files:
- "C:/Users/Shav/Documents/Code/ray-prometheus/prometheus/prom_metrics_service_discovery.json"
Referencing the Prometheus service discovery file at /tmp/ray/prom_metrics_service_discovery.json
directly, as in the docs, I face issues:
"Error reading file" path=C:/Users/Shav/AppData/Local/Temp/ray/prom_metrics_service_discovery.json err="open C:/Users/Shav/AppData/Local/Temp/ray/prom_metrics_service_discovery.json: The process cannot access the file because it is being used by another process.
I am assuming that Ray periodically updates this file with the addresses of all metrics agents in the cluster, and therefore it is in use by that process. Is there an argument i am missing to enable sharing ?
So far i have started the dashboard with the following in a ipynb
:
import ray
if ray.is_initialized():
ray.shutdown()
ray.init()
In the end, What i ended up doing was hard copying the file:
cp -r C:/Users/Shav/AppData/Local/Temp/ray/prom_metrics_service_discovery.json .
./prometheus.exe --config.file=prometheus.yml
Which fixes my issues.
Grafana
I faced similar issues Creating a new Grafana server with the provided dashboards. The docs say to run:
./bin/grafana-server --config /tmp/ray/session_latest/metrics/grafana/grafana.ini web
Simply running this gives a similar error:
ERROR[09-19|13:27:35] failed to parse "C:/Users/Shav/AppData/Local/Temp/ray/session_latest/metrics/grafana/grafana.ini": open C:/Users/Shav/AppData/Local/Temp/ray/session_latest/metrics/grafana/grafana.ini: The system cannot find the path specified. logger=settings
So again, this time if i run :
CWD=$(pwd)
cd /tmp/ray
# Get the latest folder
SESSIONLATEST=$(ls -td */ | head -1)
cd $CWD/grafana/
./bin/grafana-server --config C:/Users/Shav/AppData/Local/Temp/ray/$SESSIONLATEST/metrics/grafana/grafana.ini web
I get the same not found errors:
ERROR[09-19|14:16:12] can't read datasource provisioning files from directory logger=provisioning.datasources path=C:\\Users\\Shav\\Documents\\Code\\ray-prometheus\\grafana\\tmp\\ray\\session_latest\\metrics\\grafana\\provisioning\\datasources error="open C:\\Users\\Shav\\Documents\\Code\\ray-prometheus\\grafana\\tmp\\ray\\session_latest\\metrics\\grafana\\provisioning\\datasources: The system cannot find the path specified."
This is because in the grafana.ini
in metrics/grafana/
the paths are set to:
[paths]
provisioning = /tmp/ray/session_latest/metrics/grafana/provisioning
Which cant be resolved. I work around this by running:
CWD=$(pwd)
cd /tmp/ray
# Get the latest folder
FOLDER=$(ls -td */ | head -1)
cd $FOLDER/metrics/grafana/
# replace "provisioning = ..." with "provisioning = "C:/Users/Shav/AppData/Local/Temp/ray/$FOLDER/metrics/grafana/provisioning"
sed -i 's#provisioning = .*#provisioning = "C:/Users/Shav/AppData/Local/Temp/ray/'$FOLDER'/metrics/grafana/provisioning"#' grafana.ini
cat grafana.ini
# Back to CWD
cd $CWD/grafana/
./bin/grafana-server --config C:/Users/Shav/AppData/Local/Temp/ray/$FOLDER/metrics/grafana/grafana.ini web
Which finally gets me up and running.
Questions
Is this the intended behaviour on windows? Is it an error in Documentation or is it a bug? Am i missing something in terms of setup / over-complicating things?
Thanks in advance, I am new here so I hope I am posting this in the correct forum.