Set the environment of timeout in ray.init() for _wait_and_get_for_node_address

jinyong.choi · March 12, 2024, 6:02am

Hi. there
It would be more convenient to have a timeout parameter when performing the connection as shown below.

What if we could easily adjust this by setting environment variables when calling _wait_and_get_for_node_address ?
60 seconds is too long for me.

Would this modification violate the concept of ray?

rai.init(address="127.0.0.1:6379")

in node.py
        node_ip_address = ray_params.node_ip_address
        if node_ip_address is None:
            if connect_only:
                node_ip_address = self._wait_and_get_for_node_address()
            else:
                node_ip_address = ray.util.get_node_ip_address()


def _wait_and_get_for_node_address(self, timeout_s: int = 60) -> str:

jjyao · March 17, 2024, 4:53am

Hi,

@jinyong.choi normally it shouldn’t take that long to get the node address. Why do you want to change this timeout value?

jinyong.choi · March 19, 2024, 8:50am

Hi~
Sorry to late.
You are absolutely correct.
The connection time is short under normal circumstances.
I’m running on a worker node in the cluster, and when the head node is down, I wait for up to 60 seconds.
Compared to normal operation, waiting for 60 seconds seems a bit long in my opinion.

What is your perspective on the issue?

thvaisa · July 26, 2024, 10:28am

I have similar problem. Users are allowed to create a ray cluster with an external cluster manager and then the main software connects to the cluster and does whatever it was designed to do. Now if the user has forgotten launch their cluster or address/port are wrongly configured, the main software will get stuck in ray.init for some time due to head node not being alive.

Timeout for ray.init could be a nice plus, but I’ll manage without for now

Sam_Chan · July 28, 2024, 3:47am

Can you log a github feature request @thvaisa ? we’ll follow and triage.

thvaisa · July 29, 2024, 11:01am

yes, sure. I’ll do it this week

Topic		Replies	Views
Why do I get TimeOutError while setting "_temp_dir" in ray.init()? Ray Core	4	1236	March 30, 2023
How to set ray.worker.timeout Ray Core	1	1551	April 18, 2022
Ray init failed, but ray status success Ray Core	4	958	March 16, 2024
How to change default address for ray.init() Ray Core	9	671	March 23, 2022
Ray 2.10.0: Can't find a `node_ip_address.json` file when using _temp_dir in ray.init()	10	874	April 5, 2024

Set the environment of timeout in ray.init() for _wait_and_get_for_node_address

Related topics