Connecting to multiple ray clusters

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

  • Ray version: 2.45.0
  • Python version: 3.10.17
  • OS: Ubuntu 20.04.6 LTS
  • Cloud/Infrastructure: N/A
  • Other libs/tools (if relevant):

3. What happened vs. what you expected:

  • Expected: No error
  • Actual: Exception: Can't run an actor the server doesn't have a handle for

Hello,

I have two ray serve clusters, one which is hosted on a local machine, another which is hosted on a remote machine. I want to be able to reach both of them from my frontend server, and although connecting to multiple ray clusters seems to be supported (at least experimentally), I have been unable to get things working.

Here is a minimal script I’ve written and tried out with my setup:

import ray 
from ray import serve

local_ip = 'ray://{local-ip}:10001'
remote_ip = 'ray://1{remote-ip}:10001'


print("initing local cluster connection")
ray.init(local_ip)

print("initing remote cluster connection")
remote_client = ray.init(remote_ip, allow_multiple=True)


print("trying to reach RequestDeployment on local cluster")
local_handle = serve.get_app_handle("Request")

print("trying to reach RequestDeployment on remote cluster")
with remote_client:
  remote_handle = serve.get_app_handle("Request")

When I run this, everything works fine until I run the last line which throws an exception:

initing local cluster connection
2025-05-05 16:57:34,511 INFO client_builder.py:244 -- Passing the following kwargs to ray.init() on the server: log_to_driver
SIGTERM handler is not set because current thread is not the main thread.
initing remote cluster connection
2025-05-05 16:57:35,735 INFO client_builder.py:244 -- Passing the following kwargs to ray.init() on the server: log_to_driver
SIGTERM handler is not set because current thread is not the main thread.
trying to reach RequestDeployment on local cluster
trying to reach RequestDeployment on remote cluster
Traceback (most recent call last):
  File ".../multiray.py", line 20, in <module>
    remote_handle = serve.get_app_handle("Request")
  File ".../envs/service/lib/python3.10/site-packages/ray/serve/api.py", line 873, in get_app_handle
    ingress = ray.get(client._controller.get_ingress_deployment_name.remote(name))
  File ".../envs/service/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
  File ".../envs/service/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 102, in wrapper
    return getattr(ray, func.__name__)(*args, **kwargs)
  File ".../envs/service/lib/python3.10/site-packages/ray/util/client/api.py", line 42, in get
    return self.worker.get(vals, timeout=timeout)
  File ".../envs/service/lib/python3.10/site-packages/ray/util/client/worker.py", line 433, in get
    res = self._get(to_get, op_timeout)
  File ".../envs/service/lib/python3.10/site-packages/ray/util/client/worker.py", line 450, in _get
    req = ray_client_pb2.GetRequest(ids=[r.id for r in ref], timeout=timeout)
  File ".../envs/service/lib/python3.10/site-packages/ray/util/client/worker.py", line 450, in <listcomp>
    req = ray_client_pb2.GetRequest(ids=[r.id for r in ref], timeout=timeout)
  File ".../envs/service/lib/python3.10/site-packages/ray/util/client/common.py", line 135, in id
    return self.binary()
  File ".../envs/service/lib/python3.10/site-packages/ray/util/client/common.py", line 114, in binary
    self._wait_for_id()
  File ".../envs/service/lib/python3.10/site-packages/ray/util/client/common.py", line 191, in _wait_for_id
    self._set_id(self._id_future.result(timeout=timeout))
  File ".../envs/service/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File ".../envs/service/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
Exception: Can't run an actor the server doesn't have a handle for

NOTE: I am able to connect to both clusters independently. In fact I get basically the same error messages if I swap remote_ip and local_ip, meaning the issue is related to how I try and interface with the “non-default” cluster.

Any help would be appreciated!

Hey there michaelripa, welcome to the Ray community! I was able to find an example of a similar error on the forums here: Exception("Can't run an actor the server doesn't have a handle for")

While it’s not the same exact circumstance, can you let me know if this helps out at all? (There’s more discussion here Serve.shutdown() and how to reconnect to cluster - #4 by luisp)

Hi, I saw this post, it unfortunately doesn’t apply to my circumstance as I don’t explicitly call ray.shutdown() anywhere