Pure ray client management without init()

XBeg9 · May 2, 2024, 10:03pm

Hey team, I am looking into ray.init function and trying to understand how I can better integrate my app (non-ray app (dag-worker) that needs to call ray remote functions on the remote cluster (use ray internal gRPC communication), so just simple ray-client integration). The way ray.init works is to spin up global client and then trying to manage everything for me, but it actually fails when I do have like 100+ workers with ray.init in each of it, I am getting just a ton of random disconnects and other things. Is there a way for easier getting ServeControllerClient instead of relying on the whole boilerplate for ray.init ? I want to have a full control of client management, connect/disconnect, run remote calls.

The workflow that I expect to get is, while I know the deployment name, I can get the actor and then do a remote call on it. My expectation to get a handle (DeploymentHandle) and then just call it like handle.func.remote(arg)

Current status, I was able to initiate a client like this

client = _ClientContext()
    client.connect(conn_str="inference:10001", namespace=SERVE_NAMESPACE)

    print(client.is_connected())
    print(client.api.is_initialized())
    print(client.api.list_named_actors())
    print(client.client_worker.get_actor("SERVE_REPLICA::ml#LanguageDetection#dgzaYn"))

But I can’t get the actor, it gives me this error self._worker.call_retain(id) AttributeError: 'NoneType' object has no attribute 'call_retain'

XBeg9 · May 2, 2024, 11:23pm

Ok, I have fully turned this in another direction, looks like with _ClientContext I am missing lots of things.


worker = Worker()

    # In this case, we only need to connect the node.
    ray_params = RayParams(
        node_ip_address=None,
        gcs_address="10.224.3.52:6379",
        object_ref_seed=None,
        redis_address=None,
        redis_password=None,
        temp_dir="./ray",
        _system_config={},
        enable_object_reconstruction=False,
        metrics_export_port=None,
    )

    node = Node(
        ray_params,
        head=False,
        shutdown_at_exit=False,
        spawn_reaper=False,
        connect_only=True,
    )

    connect(
        node,
        node.session_name,
        mode=0, #SCRIPT_MODE = 0
        log_to_driver=True,
        worker=worker,
        job_id=None,
        namespace=SERVE_NAMESPACE,
        job_config=ray.job_config.JobConfig(),
        entrypoint=ray._private.utils.get_entrypoint_name(),
    )

Getting this error: 2024-05-02 16:19:49,938 INFO node.py:1001 -- Can't find a `node_ip_address.json` file from ray/session_2024-05-01_14-56-44_672413_8. Have you started Ray instsance using `ray start` or `ray.init`?

XBeg9 · May 2, 2024, 11:25pm

If I give node_ip_address, like this:

RayParams(
        node_ip_address="10.224.3.52",
        gcs_address="10.224.3.52:6379",

--------> [2024-05-02 16:24:11,152 C 99963 4993918] raylet_client.cc:60: Could not connect to socket /tmp/ray/session_2024-05-01_14-56-44_672413_8/sockets/raylet

Topic		Replies	Views
Options other than using ray client Ray Client	1	61	July 23, 2024
Ray Client remote does not work Ray Clusters	6	227	September 25, 2024
Exception: ray.init() called, but ray client is already connected Ray Serve	6	4918	August 25, 2022
Ray status can connect but client code cannot Ray Clusters	10	1266	August 3, 2021
Future of ray.init(remote address) Ray Core	10	437	May 14, 2023

Pure ray client management without init()

Related topics