I modified the example-full.yaml file to add the following information:
- Head IP Address
- Worker IP Addresses
- ssh_user
I used the same file to deploy a Ray cluster, which works fine.
If I use that same file on a different machine to start a dashboard so I can submit a job, it fails:
ray dashboard example-full.yaml
Actually, it fails for all ray commands (e.g., exec), but I can connect using python - ray.init(address=head_node_ip_address) from a different machine. What am I doing wrong here?
cc: @sangcho @aguo for ideas
Can you please post an error message?
Attempting to establish dashboard locally at http://localhost:8265/ connected to remote port 8265
2023-02-01 19:26:19,591 VWARN commands.py:337 – Loaded cached provider configuration from /tmp/ray-config-fafd691b0a09741a8227e501f5bc38e2843d0db8
2023-02-01 19:26:19,591 WARN commands.py:345 – If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
2023-02-01 19:26:19,591 INFO node_provider.py:54 – ClusterState: Loaded cluster state: [‘ip_address_1’, ‘ip_address_2’, ‘ip_address_3’]
Error: Failed to forward dashboard from remote port 8265 to local port 8265. There are a couple possibilities:
- The remote port is incorrectly specified
- The local port 8265 is already in use.
The exception is: Head node of cluster (default) not found!
I am getting the same error