1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
2. Environment:
- Ray version: 2.45.0
- Python version: 3.10.17
- OS: Ubuntu 20.04.6 LTS
- Cloud/Infrastructure: N/A
- Other libs/tools (if relevant):
3. What happened vs. what you expected:
- Expected: No error
- Actual:
Exception: Can't run an actor the server doesn't have a handle for
Hello,
I have two ray serve clusters, one which is hosted on a local machine, another which is hosted on a remote machine. I want to be able to reach both of them from my frontend server, and although connecting to multiple ray clusters seems to be supported (at least experimentally), I have been unable to get things working.
Here is a minimal script I’ve written and tried out with my setup:
import ray
from ray import serve
local_ip = 'ray://{local-ip}:10001'
remote_ip = 'ray://1{remote-ip}:10001'
print("initing local cluster connection")
ray.init(local_ip)
print("initing remote cluster connection")
remote_client = ray.init(remote_ip, allow_multiple=True)
print("trying to reach RequestDeployment on local cluster")
local_handle = serve.get_app_handle("Request")
print("trying to reach RequestDeployment on remote cluster")
with remote_client:
remote_handle = serve.get_app_handle("Request")
When I run this, everything works fine until I run the last line which throws an exception:
initing local cluster connection
2025-05-05 16:57:34,511 INFO client_builder.py:244 -- Passing the following kwargs to ray.init() on the server: log_to_driver
SIGTERM handler is not set because current thread is not the main thread.
initing remote cluster connection
2025-05-05 16:57:35,735 INFO client_builder.py:244 -- Passing the following kwargs to ray.init() on the server: log_to_driver
SIGTERM handler is not set because current thread is not the main thread.
trying to reach RequestDeployment on local cluster
trying to reach RequestDeployment on remote cluster
Traceback (most recent call last):
File ".../multiray.py", line 20, in <module>
remote_handle = serve.get_app_handle("Request")
File ".../envs/service/lib/python3.10/site-packages/ray/serve/api.py", line 873, in get_app_handle
ingress = ray.get(client._controller.get_ingress_deployment_name.remote(name))
File ".../envs/service/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
File ".../envs/service/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 102, in wrapper
return getattr(ray, func.__name__)(*args, **kwargs)
File ".../envs/service/lib/python3.10/site-packages/ray/util/client/api.py", line 42, in get
return self.worker.get(vals, timeout=timeout)
File ".../envs/service/lib/python3.10/site-packages/ray/util/client/worker.py", line 433, in get
res = self._get(to_get, op_timeout)
File ".../envs/service/lib/python3.10/site-packages/ray/util/client/worker.py", line 450, in _get
req = ray_client_pb2.GetRequest(ids=[r.id for r in ref], timeout=timeout)
File ".../envs/service/lib/python3.10/site-packages/ray/util/client/worker.py", line 450, in <listcomp>
req = ray_client_pb2.GetRequest(ids=[r.id for r in ref], timeout=timeout)
File ".../envs/service/lib/python3.10/site-packages/ray/util/client/common.py", line 135, in id
return self.binary()
File ".../envs/service/lib/python3.10/site-packages/ray/util/client/common.py", line 114, in binary
self._wait_for_id()
File ".../envs/service/lib/python3.10/site-packages/ray/util/client/common.py", line 191, in _wait_for_id
self._set_id(self._id_future.result(timeout=timeout))
File ".../envs/service/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File ".../envs/service/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
Exception: Can't run an actor the server doesn't have a handle for
NOTE: I am able to connect to both clusters independently. In fact I get basically the same error messages if I swap remote_ip
and local_ip
, meaning the issue is related to how I try and interface with the “non-default” cluster.
Any help would be appreciated!