Ray dashboard shows only briefly with double SSH hop

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I am creating a Ray cluster on AWS. I have an EC2 instance running Amazon Linux 2023 which serves as my “launcher” - it has python 3.9.16, with ray[“default”]==2.6.1 installed in a virtual environment, and has a ray_cluster.yaml which contains parameters to set up the cluster.
My local machine is a Windows 11 laptop.

Scenario 1: working scenario with public IPs
Instances in my subnet are auto-assigned public IPv4 addresses.
I SSH into my launcher instance, start up the virtual environment and run ray up ray_cluster.yaml
My cluster is successfully launched.
I now directly forward the dashboard port from the head node to my local machine using the following command from a terminal on my local machine:

ssh -L 8265:localhost:8265 -i C:\keyfilelocation\keyfile_local.pem ec2-user@ec2-x-x-x-x.us-east-2.compute.amazonaws.com

where ec2-user@ec2-x-x-x-x.us-east-2.compute.amazonaws.com is the public IPv4 DNS of my newly-created head node.
I can then open http://127.0.0.1:8265/#overview in my local browser and view the Ray dashboard.

Scenario 2: non-working scenario with private IPs
Instances in my subnet are not auto-assigned public IP addresses.
I SSH into my launcher instance, start up the virtual environment and run ray up ray_cluster.yaml
My cluster is successfully launched.
Since my head node has no public IP address, I can’t directly forward the 8265 port. I thought that I should be able to do a double port forward through my launcher (which does have a public IPv4 address and is in the same subnet as the head node).
So I first did this to get into the launcher EC2 instance:

ssh -L 8265:localhost:8265 -i C:\keyfilelocation\keyfile_local.pem ec2-user@ec2-x-x-x-x.us-east-2.compute.amazonaws.com

And then from within the launcher instance I did:

ssh -L 8265:localhost:8265 -i ~/.ssh/ray-autoscaler_1_us-east-2.pem ec2-user@172.22.33.213

Note that the keyfiles are different, I don’t know if this matters.

Now from my local browser I open http://127.0.0.1:8265/#overview
I can briefly (for about a second) see the dashboard, then it vanishes and I see a blank screen. And that’s it.

If I do only one or none of the SSH hops, then attempting to access the dashboard URL gives me a “This site can’t be reached” error, which is different from briefly showing the dashboard. So I know that the double SSH is partly able to make the connection, but seems like something is then blocking.

Has anybody done something similar and been able to solve this?

This post describes a similar problem: Error 403 for ray dashboard on localhost in ray 2.8.0 on Windows11 - #11 by PhilippWillms

And when I checked the Network tab within the Inspect panel of my browser, I did see that some of the calls were resulting in 503 errors.

Maybe it is indeed helpful to add comments on [Dashboard] Error 403 for ray dashboard on localhost in ray 2.8.0 · Issue #41379 · ray-project/ray · GitHub

The fix has been merged. [Dashboard] Fix Path Resolution on Windows by ijrsvt · Pull Request #41388 · ray-project/ray · GitHub. It will be available in 2.9 which will come in mid-dec

@Huaiwei_Sun thanks for taking a look at this and for linking the other discussion. I’ll test the fix when it is available, but do you think this is the same problem?

Note my description in my original post that Ray itself is installed on a Linux (Amazon Linux 2023) instance. The instance used to launch Ray (where I run ray up) as well as the head/worker nodes are all Linux. Yes my laptop is Windows 11, but I am port-forwarding to the Ray instance and then attempting to open up the dashboard in the browser.