Set the environment of timeout in ray.init() for _wait_and_get_for_node_address

Hi. there
It would be more convenient to have a timeout parameter when performing the connection as shown below.

What if we could easily adjust this by setting environment variables when calling _wait_and_get_for_node_address ?
60 seconds is too long for me. :sweat_smile:

Would this modification violate the concept of ray?

rai.init(address="127.0.0.1:6379")

in node.py
        node_ip_address = ray_params.node_ip_address
        if node_ip_address is None:
            if connect_only:
                node_ip_address = self._wait_and_get_for_node_address()
            else:
                node_ip_address = ray.util.get_node_ip_address()


def _wait_and_get_for_node_address(self, timeout_s: int = 60) -> str:

Hi,

@jinyong.choi normally it shouldn’t take that long to get the node address. Why do you want to change this timeout value?

Hi~
Sorry to late.
You are absolutely correct.
The connection time is short under normal circumstances.
I’m running on a worker node in the cluster, and when the head node is down, I wait for up to 60 seconds.
Compared to normal operation, waiting for 60 seconds seems a bit long in my opinion.

What is your perspective on the issue? :grinning:

I have similar problem. Users are allowed to create a ray cluster with an external cluster manager and then the main software connects to the cluster and does whatever it was designed to do. Now if the user has forgotten launch their cluster or address/port are wrongly configured, the main software will get stuck in ray.init for some time due to head node not being alive.

Timeout for ray.init could be a nice plus, but I’ll manage without for now

Can you log a github feature request @thvaisa ? we’ll follow and triage.

yes, sure. I’ll do it this week

2 Likes