Ray Dashboard Setup on Windows

Issue with Ray Dashboard Setup on Windows

Prometheus

Hiya guys, I am new to Ray and facing an issue with the Ray Dashboard. For context, running systeminfo

OS Name:                   Microsoft Windows 11 Pro
OS Version:                10.0.22621 N/A Build 22621
System Type:               x64-based PC
Processor(s):              Intel64 Family 6 Model 165 Stepping 5 GenuineIntel ~2904 Mhz

When running the Ray Dashboard on Windows, the Prometheus and Grafana setup doesnt work the way it is detailed in the docs. The docs say that to run the dashboard, you need to run the following command:

./prometheus --config.file=/tmp/ray/session_latest/metrics/prometheus/prometheus.yml

I understand that the symbolic link /tmp/ray/session_latest points to the latest active Ray session. This doesn’t seem to work for me on win 11 (even after enabling symbolic links in my Git Bash Setup). This results in :

msg="Error loading config (--config.file=C:/Users/Shav/AppData/Local/Temp/ray/session_latest/metrics/prometheus/prometheus.yml)" file=C:\Users\Shav\AppData\Local\Temp\ray\session_latest\metrics\prometheus\prometheus.yml err="open C:/Users/Shav/AppData/Local/Temp/ray/session_latest/metrics/prometheus/prometheus.yml: The system cannot find the path specified."

To get around this symbolic link issue, if I do the following to get the latest session:

CWD=$(pwd)
cd /tmp/ray
# Get the latest folder
SESSIONLATEST=$(ls -td */ | head -1)
cd $CWD/prometheus
./prometheus --config.file=/tmp/ray/$SESSIONLATEST/metrics/prometheus/prometheus.yml

I face another issue. This setup, as well as if i create my own prometheus.yml file with the following contents:

# Prometheus config file
# my global config
global:
  scrape_interval: 2s
  evaluation_interval: 2s
# Scrape from Ray.
scrape_configs:
  - job_name: "ray"
    file_sd_configs:
      - files:
          - "C:/Users/Shav/Documents/Code/ray-prometheus/prometheus/prom_metrics_service_discovery.json"

Referencing the Prometheus service discovery file at /tmp/ray/prom_metrics_service_discovery.json directly, as in the docs, I face issues:

"Error reading file" path=C:/Users/Shav/AppData/Local/Temp/ray/prom_metrics_service_discovery.json err="open C:/Users/Shav/AppData/Local/Temp/ray/prom_metrics_service_discovery.json: The process cannot access the file because it is being used by another process.

I am assuming that Ray periodically updates this file with the addresses of all metrics agents in the cluster, and therefore it is in use by that process. Is there an argument i am missing to enable sharing ?

So far i have started the dashboard with the following in a ipynb:

import ray
if ray.is_initialized():
    ray.shutdown()
ray.init()

In the end, What i ended up doing was hard copying the file:

cp -r C:/Users/Shav/AppData/Local/Temp/ray/prom_metrics_service_discovery.json .
./prometheus.exe --config.file=prometheus.yml

Which fixes my issues.

Grafana

I faced similar issues Creating a new Grafana server with the provided dashboards. The docs say to run:

./bin/grafana-server --config /tmp/ray/session_latest/metrics/grafana/grafana.ini web

Simply running this gives a similar error:

ERROR[09-19|13:27:35] failed to parse "C:/Users/Shav/AppData/Local/Temp/ray/session_latest/metrics/grafana/grafana.ini": open C:/Users/Shav/AppData/Local/Temp/ray/session_latest/metrics/grafana/grafana.ini: The system cannot find the path specified. logger=settings

So again, this time if i run :

CWD=$(pwd)
cd /tmp/ray
# Get the latest folder
SESSIONLATEST=$(ls -td */ | head -1)
cd $CWD/grafana/
./bin/grafana-server --config C:/Users/Shav/AppData/Local/Temp/ray/$SESSIONLATEST/metrics/grafana/grafana.ini web

I get the same not found errors:

ERROR[09-19|14:16:12] can't read datasource provisioning files from directory logger=provisioning.datasources path=C:\\Users\\Shav\\Documents\\Code\\ray-prometheus\\grafana\\tmp\\ray\\session_latest\\metrics\\grafana\\provisioning\\datasources error="open C:\\Users\\Shav\\Documents\\Code\\ray-prometheus\\grafana\\tmp\\ray\\session_latest\\metrics\\grafana\\provisioning\\datasources: The system cannot find the path specified."

This is because in the grafana.ini in metrics/grafana/ the paths are set to:

[paths]
provisioning = /tmp/ray/session_latest/metrics/grafana/provisioning

Which cant be resolved. I work around this by running:

CWD=$(pwd)
cd /tmp/ray
# Get the latest folder
FOLDER=$(ls -td */ | head -1)
cd $FOLDER/metrics/grafana/
# replace "provisioning = ..." with "provisioning = "C:/Users/Shav/AppData/Local/Temp/ray/$FOLDER/metrics/grafana/provisioning"
sed -i 's#provisioning = .*#provisioning = "C:/Users/Shav/AppData/Local/Temp/ray/'$FOLDER'/metrics/grafana/provisioning"#' grafana.ini
cat grafana.ini
# Back to CWD
cd $CWD/grafana/
./bin/grafana-server --config C:/Users/Shav/AppData/Local/Temp/ray/$FOLDER/metrics/grafana/grafana.ini web

Which finally gets me up and running.

Questions

Is this the intended behaviour on windows? Is it an error in Documentation or is it a bug? Am i missing something in terms of setup / over-complicating things?

Thanks in advance, I am new here so I hope I am posting this in the correct forum. :slight_smile:

Do the paths exist if you check manually? For example, tmp/ray/session_latest/metrics/ ?

@sangcho @aguo for thoughts as well.

Hello! Thanks for responding.

So ls in tmp/ray/ returns:

prom_metrics_service_discovery.json       session_2023-09-14_15-13-43_353468_9472   session_2023-09-18_10-32-56_775611_15948  session_2023-09-19_10-15-38_925721_2708

etc…

And I can confirm that within these folders, e.g. /tmp/ray/session_2023-09-14_15-13-43_353468_9472/metrics, I have the directories:

grafana  prometheus

The issues is just the “session_latest” symbolic link i think. For example, if i run /tmp/ray/session_2023-09-14_15-13-43_353468_9472/metrics it works, but /tmp/raysession_latest/metrics it doesn’t.

$ cd /tmp/raysession_latest/metrics
bash: cd: /tmp/raysession_latest/metrics: No such file or directory

Hmmm. Not an expert on this topic. It seems that the symbolic link created by Ray just doesn’t work on Windows? The /tmp/ray/session_latest just doesn’t exist?

@Shav can you file a gh issue and include your windows version, ray version, etc and how to reproduce this?

@Shav it is /tmp/ray/session_latest, not /tmp/raysession_latest?

Sorry, that was a mis-spelling from my side, but

$ cd /tmp/ray/session_latest
bash: cd: /tmp/ray/session_latest: No such file or directory

Doesnt work, is what i meant, my bad.

Hmm that’s pretty odd. When ray first starts, we are creating a symbolic link. When you access session_latest, it doesn’t work at all? Maybe symbolic link impl is not working in windows ray

I was facing the same issue. You need admin rights on windows to create symbolic links… even on the current user temp folder. Just run the code with admin rights then use “C:\Users\UTILIS~1\AppData\Local\Temp\ray\session_latest”

For the gafana.ini Go into the conda environment or virtual environment and modify the defaut ini config.

here Lib\site-packages\ray\dashboard\modules\metrics\export\grafana

and set the provisionning path to C:/Users/UTILIS~1/AppData/Local/Temp/ray/session_latest/metrics/grafana/provisioning

DO the same thing for prometheus if you like.