Certain Ray CLI commands complain about invalid GCS address

jleben · January 23, 2025, 1:08am

Certain Ray CLI commands that take the --address argument (or the RAY_ADDRESS env var) complain like this:

ValueError: Invalid gcs_address: http://10.212.131.238:8265/

This is the case at least for: ray monitor and ray status. However, ray job submit and ray summary tasks for example work fine.

What might be the reason for the error and how can I avoid it?

christina · January 23, 2025, 10:17pm

Hello! I did a bit of searching on the docs and I think this is what’s going on here.

The ray monitor and ray status commands expect a direct address to the Ray cluster, typically in the format of ip:port (e.g., 10.0.0.1:6379 ), rather than a URL with a protocol like http://(ray job submit or ray summary are designed to listen to https:// endpoints, which is why they don’t throw an error).

For the commands where there is an error, they are expecting a different format, such as 10.212.131.238:8265 instead of http://10.212.131.238:8265/. This should resolve the issue for commands like ray monitor and ray status.

So, I think you can try using https:// for any dashboard commands, and the <ip>:<port> format for any cluster interactions. Let me know if this fixes the issue for you.

jleben · January 24, 2025, 1:46am

Hi Christina,

Thank you for your reply. I tried ray status --address 10.212.131.238:8265 and got the error:

[2025-01-24 01:44:58,524 E 2803 2803] gcs_rpc_client.h:179: Failed to connect to GCS at address 10.212.131.238:8265 within 5 seconds

jleben · January 24, 2025, 1:48am

I also get the same error from ray status --address 10.212.131.238:6379 (using the port that was set in ray start --head --port=6379 ...)

christina · January 24, 2025, 2:52am

Hmmm, I’m assuming your Ray cluster is already running when you’re trying to connect?

Do you have any firewalls or network restrictions that might block access to the port? Can you try pinging it to see if pinging the IP / port works?
Are there any errors in the Ray logs at tmp/ray/session_latest/logs/ (or wherever your logs are stored?)
Is it possible that there is another process on the port?

jleben · January 24, 2025, 7:47am

Thanks for pointing me in the right direction Christina!

I had to allow connections on port 6379 for ray status --address 10.212.154.239:6379 to work

Curiously, I also had to allow connections on port 41823 for ray memory --address 10.212.154.239:6379 to work. However, it seems this port is chosen randomly at start, so after restarting my cluster the command failed again. Do you know perhaps which of the ray start options mentioned here would control this port so I can make it constant?

I should say it’s a rather unexpected and impractical behaviour that ray status and ray job submit expect address in a different format even though they will both use the address from the same environment variable if not given on command line. This makes it impossible to avoid the --address option for a subset of commands.

Is someone finds that helpful, as an alternative to specifying --address, I noticed all of these commands work without --address option or RAY_ADDRESS env var when run on the head node, which can be done from outside the cluster via ray exec, for example ray exec config.yaml 'ray status'. Some may find this more convenient than having to remember what to pass as --address.

Thank you for your help finding workarounds to my troubles!

christina · January 27, 2025, 11:54pm

Hi! I’m glad to hear you solved the port problems!

As for the ray start options, I think these would help make the port more constant:

--min-worker-port: Minimum port number worker can be bound to. Default: 10002.
--max-worker-port: Maximum port number worker can be bound to. Default: 19999.

So, for example, this will allow you to set a specific range of ports that Ray can use, making it easier to manage and predict which ports need to be open. Here is how you can specify these options:

bash
ray start --head --port=6379 --min-worker-port=10000 --max-worker-port=10010

This will limit the worker ports to the range 10000-10010, and you can open these ports in your firewall settings. So if you want to limit the ports to like 41823, I guess you can set the min and max to the same number?

Topic		Replies	Views
Gcs_rpc_client.h:179: Failed to connect to GCS at address 192.168.85.116:6379 within 5 seconds Configure Algorithm, Training, Evaluation, Scaling	4	1629	February 12, 2025
Cannot connect to GCS Ray Clusters	3	1578	March 1, 2023
ERROR gcs_utils.py:137 -- Failed to send request to gcs Ray Clusters	20	2661	February 11, 2022
Serve run cli to remote Ray Serve	1	324	November 9, 2023
2023-03-19 13:38:56,574 WARNING utils.py:1445 -- Unable to connect to GCS at gaowei0155.69.142.146:8901 Ray Core	1	447	March 21, 2023

Certain Ray CLI commands complain about invalid GCS address

Related topics