How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I am creating a Ray cluster on AWS. I have an EC2 instance running Amazon Linux 2023 which serves as my “launcher” - it has python 3.9.16, with ray[“default”]==2.6.1 installed in a virtual environment, and has a ray_cluster.yaml which contains parameters to set up the cluster.
My local machine is a Windows 11 laptop.
Scenario 1: working scenario with public IPs
Instances in my subnet are auto-assigned public IPv4 addresses.
I SSH into my launcher instance, start up the virtual environment and run ray up ray_cluster.yaml
My cluster is successfully launched.
I now directly forward the dashboard port from the head node to my local machine using the following command from a terminal on my local machine:
ssh -L 8265:localhost:8265 -i C:\keyfilelocation\keyfile_local.pem ec2-user@ec2-x-x-x-x.us-east-2.compute.amazonaws.com
where ec2-user@ec2-x-x-x-x.us-east-2.compute.amazonaws.com is the public IPv4 DNS of my newly-created head node.
I can then open http://127.0.0.1:8265/#overview
in my local browser and view the Ray dashboard.
Scenario 2: non-working scenario with private IPs
Instances in my subnet are not auto-assigned public IP addresses.
I SSH into my launcher instance, start up the virtual environment and run ray up ray_cluster.yaml
My cluster is successfully launched.
Since my head node has no public IP address, I can’t directly forward the 8265 port. I thought that I should be able to do a double port forward through my launcher (which does have a public IPv4 address and is in the same subnet as the head node).
So I first did this to get into the launcher EC2 instance:
ssh -L 8265:localhost:8265 -i C:\keyfilelocation\keyfile_local.pem ec2-user@ec2-x-x-x-x.us-east-2.compute.amazonaws.com
And then from within the launcher instance I did:
ssh -L 8265:localhost:8265 -i ~/.ssh/ray-autoscaler_1_us-east-2.pem ec2-user@172.22.33.213
Note that the keyfiles are different, I don’t know if this matters.
Now from my local browser I open http://127.0.0.1:8265/#overview
I can briefly (for about a second) see the dashboard, then it vanishes and I see a blank screen. And that’s it.
If I do only one or none of the SSH hops, then attempting to access the dashboard URL gives me a “This site can’t be reached” error, which is different from briefly showing the dashboard. So I know that the double SSH is partly able to make the connection, but seems like something is then blocking.
Has anybody done something similar and been able to solve this?