Can't access dashboard or find process listening on port 8265

I am trying to access the dashboard. Python:

import logging
import time

import ray

print("ray version:", ray.__version__)
ray.init(include_dashboard=True, logging_level=logging.DEBUG)
iter = 0
while True:
  time.sleep(2)
  print(f"tick[{iter}]")
  iter += 1

Output:

ray version: 2.4.0
2023-06-12 18:28:38,600	DEBUG worker.py:1376 -- Automatically increasing RLIMIT_NOFILE to max value of 524288
2023-06-12 18:28:38,629	DEBUG node.py:1141 -- Process STDOUT and STDERR is being redirected to /home/spear3/py/try/tryray/ray/session_2023-06-12_18-28-38_623575_14599/logs.
2023-06-12 18:28:38,637	DEBUG gcs_utils.py:300 -- internal_kv_get b'dummy' None
2023-06-12 18:28:38,638	DEBUG gcs_utils.py:218 -- Failed to send request to gcs, reconnecting. Error <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:192.168.50.143:60174: Failed to connect to remote host: Connection refused"
	debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:192.168.50.143:60174: Failed to connect to remote host: Connection refused {created_time:"2023-06-12T18:28:38.638675289+02:00", grpc_status:14}"
>
2023-06-12 18:28:39,640	DEBUG gcs_utils.py:300 -- internal_kv_get b'dummy' None
2023-06-12 18:28:39,642	DEBUG gcs_utils.py:342 -- internal_kv_put b'CLUSTER_METADATA' b'{"ray_version": "2.4.0", "python_version": "3.10.11"}' True b'cluster'
2023-06-12 18:28:39,642	DEBUG gcs_utils.py:342 -- internal_kv_put b'session_name' b'session_2023-06-12_18-28-38_623575_14599' True b'session'
2023-06-12 18:28:39,643	DEBUG gcs_utils.py:342 -- internal_kv_put b'session_dir' b'/home/spear3/py/try/tryray/ray/session_2023-06-12_18-28-38_623575_14599' True b'session'
2023-06-12 18:28:39,643	DEBUG gcs_utils.py:342 -- internal_kv_put b'temp_dir' b'/home/spear3/py/try/tryray/ray' True b'session'
2023-06-12 18:28:39,816	DEBUG gcs_utils.py:300 -- internal_kv_get b'dashboard' b'dashboard'
2023-06-12 18:28:39,918	DEBUG gcs_utils.py:300 -- internal_kv_get b'dashboard' b'dashboard'
2023-06-12 18:28:40,019	DEBUG gcs_utils.py:300 -- internal_kv_get b'dashboard' b'dashboard'
2023-06-12 18:28:40,120	DEBUG gcs_utils.py:300 -- internal_kv_get b'dashboard' b'dashboard'
2023-06-12 18:28:40,221	DEBUG gcs_utils.py:300 -- internal_kv_get b'dashboard' b'dashboard'
2023-06-12 18:28:40,322	DEBUG gcs_utils.py:300 -- internal_kv_get b'dashboard' b'dashboard'
2023-06-12 18:28:40,423	DEBUG gcs_utils.py:300 -- internal_kv_get b'dashboard' b'dashboard'
2023-06-12 18:28:40,524	DEBUG gcs_utils.py:300 -- internal_kv_get b'dashboard' b'dashboard'
2023-06-12 18:28:40,625	DEBUG gcs_utils.py:300 -- internal_kv_get b'dashboard' b'dashboard'
2023-06-12 18:28:40,727	DEBUG gcs_utils.py:300 -- internal_kv_get b'dashboard' b'dashboard'
2023-06-12 18:28:40,828	DEBUG gcs_utils.py:300 -- internal_kv_get b'dashboard' b'dashboard'
2023-06-12 18:28:40,829	DEBUG gcs_utils.py:342 -- internal_kv_put b'webui:url' b'127.0.0.1:8265' True b'dashboard'
2023-06-12 18:28:40,829	DEBUG node.py:1179 -- Process STDOUT and STDERR is being redirected to /home/spear3/py/try/tryray/ray/session_2023-06-12_18-28-38_623575_14599/logs.
2023-06-12 18:28:40,832	DEBUG services.py:1884 -- Determine to start the Plasma object store with 1.88 GB memory using /dev/shm.
2023-06-12 18:28:40,956	DEBUG gcs_utils.py:342 -- internal_kv_put b'extra_usage_tag_gcs_storage' b'memory' True b'usage_stats'
2023-06-12 18:28:40,957	INFO worker.py:1616 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
2023-06-12 18:28:40,960	DEBUG gcs_utils.py:300 -- internal_kv_get b'CLUSTER_METADATA' b'cluster'
2023-06-12 18:28:42,642	DEBUG gcs_utils.py:300 -- internal_kv_get b'IsolatedExports:01000000:\x00\x00\x00\x00\x00\x00\x00\x01' b'fun'
2023-06-12 18:28:42,643	DEBUG gcs_utils.py:300 -- internal_kv_get b'IsolatedExports:01000000:\x00\x00\x00\x00\x00\x00\x00\x01' b'fun'
2023-06-12 18:28:42,644	DEBUG gcs_utils.py:300 -- internal_kv_get b'__autoscaling_error' None
2023-06-12 18:28:42,644	DEBUG gcs_utils.py:300 -- internal_kv_get b'tracing_startup_hook' b'tracing'
2023-06-12 18:28:42,645	DEBUG gcs_utils.py:342 -- internal_kv_put b'extra_usage_tag_gcs_storage' b'memory' True b'usage_stats'
tick[0]
tick[1]
tick[2]
tick[3]
tick[4]
...

Can’t find any process listening on 8265:
$ sudo lsof -nP -iTCP -sTCP:LISTEN | grep 8265 and $ sudo ss -tunlp | grep 8265 come back with nothing.

There are some processes shown in ss -tunlp, but none of them are associated with port 8265:

Netid           State            Recv-Q           Send-Q                       Local Address:Port                        Peer Address:Port           Process
tcp             LISTEN           0                128                                0.0.0.0:52365                            0.0.0.0:*               users:(("python3",pid=15521,fd=15))              
tcp             LISTEN           0                5                                  0.0.0.0:44217                            0.0.0.0:*               users:(("python3",pid=15352,fd=8))                            
tcp             LISTEN           0                5                                  0.0.0.0:47638                            0.0.0.0:*               users:(("python3",pid=15521,fd=13))              
tcp             LISTEN           0                4096                                     *:38435                                  *:*               users:(("ray::IDLE",pid=15549,fd=15))            
tcp             LISTEN           0                4096                                     *:42939                                  *:*               users:(("ray::IDLE",pid=15552,fd=14))            
tcp             LISTEN           0                4096                                     *:61854                                  *:*               users:(("gcs_server",pid=15317,fd=20))           
tcp             LISTEN           0                4096                                     *:41805                                  *:*               users:(("ray::IDLE",pid=15548,fd=14))            
tcp             LISTEN           0                4096                                     *:46024                                  *:*               users:(("python3",pid=15521,fd=12))              
tcp             LISTEN           0                4096                                     *:44117                                  *:*               users:(("ray::IDLE",pid=15545,fd=14))            
tcp             LISTEN           0                4096                                     *:40053                                  *:*               users:(("ray::IDLE",pid=15547,fd=14))            
tcp             LISTEN           0                4096                                     *:36355                                  *:*               users:(("ray::IDLE",pid=15546,fd=14))            
tcp             LISTEN           0                4096                                     *:36447                                  *:*               users:(("raylet",pid=15443,fd=30))               
tcp             LISTEN           0                4096                                     *:40553                                  *:*               users:(("ray::IDLE",pid=15550,fd=14))                           
tcp             LISTEN           0                4096                                     *:43301                                  *:*               users:(("python3",pid=15316,fd=17))              
tcp             LISTEN           0                4096                                     *:43339                                  *:*               users:(("ray::IDLE",pid=15551,fd=14))                           
tcp             LISTEN           0                4096                                     *:39633                                  *:*               users:(("raylet",pid=15443,fd=24))

Link to Ray Dashboard documentation is broken btw (404 Not Found): https://docs.ray.io/en/latest/ray-core/ray-dashboard.html

Found some more information looking at dashboard.err:

Traceback (most recent call last):
  File "/nix/store/jnkjfn6s5g8g2r3ii45hg3j3vg0lgi0j-python3-3.10.11-env/lib/python3.10/site-packages/ray/dashboard/dashboard.py", line 240, in <module>
    raise e
  File "/nix/store/jnkjfn6s5g8g2r3ii45hg3j3vg0lgi0j-python3-3.10.11-env/lib/python3.10/site-packages/ray/dashboard/dashboard.py", line 228, in <module>
    loop.run_until_complete(dashboard.run())
  File "/nix/store/95cxzy2hpizr23343b8bskl4yacf4b3l-python3-3.10.11/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/nix/store/jnkjfn6s5g8g2r3ii45hg3j3vg0lgi0j-python3-3.10.11-env/lib/python3.10/site-packages/ray/dashboard/dashboard.py", line 70, in run
    await self.dashboard_head.run()
  File "/nix/store/jnkjfn6s5g8g2r3ii45hg3j3vg0lgi0j-python3-3.10.11-env/lib/python3.10/site-packages/ray/dashboard/head.py", line 329, in run
    await asyncio.gather(*concurrent_tasks, *(m.run(self.server) for m in modules))
  File "/nix/store/jnkjfn6s5g8g2r3ii45hg3j3vg0lgi0j-python3-3.10.11-env/lib/python3.10/site-packages/ray/dashboard/modules/metrics/metrics_head.py", line 305, in run
    self._create_default_grafana_configs()
  File "/nix/store/jnkjfn6s5g8g2r3ii45hg3j3vg0lgi0j-python3-3.10.11-env/lib/python3.10/site-packages/ray/dashboard/modules/metrics/metrics_head.py", line 200, in _create_default_grafana_configs
    os.makedirs(
  File "/nix/store/95cxzy2hpizr23343b8bskl4yacf4b3l-python3-3.10.11/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/nix/store/95cxzy2hpizr23343b8bskl4yacf4b3l-python3-3.10.11/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/spear3/py/try/tryray/ray/session_2023-06-12_19-03-01_405672_27244/metrics/grafana/provisioning'

I believe it is trying to create the directory, but it does not have write permissions:

$ ls -lh /home/spear3/py/try/tryray/ray/session_2023-06-12_19-03-01_405672_27244/metrics
total 4.0K
dr-xr-xr-x 2 spear3 users 4.0K Jan  1  1970 grafana

Sorry about the broken links. We refactored the dashboard doc in 2.5 but the redirects are not set correctly. We’ll fix them soon.

For now, you can take a look at these two links

Here are some questions for me to better understand your setup:

  • how do you launch the cluster? Is it local, vm launcher, or kuberay?
  • did you install “ray[default]”, “ray[air]”, or other ones that include dashboard component? Installing Ray — Ray 2.5.0

Thanks for the reply, it only seems to happen with the ray 2.4.0 package that is part of the NixOS repository. If I install ray 2.5.0 from pip it works after using patchelf to fix up the interpreter path of the gcs_server and raylet executables.

Glad that you figured it out. Anything we should change it make it work out of box?