Can I deploy services to other machines in the cluster?

839576266 · March 18, 2022, 1:38am

I have a cluster consisting of three machines (A, B, C). I want to specify the ip of B through serve.start() on machine A, and it prompts me that the binding fails. Does the ray server support this deployment method?

eoakes · March 18, 2022, 4:37pm

@839576266 you only need to call serve.start on one machine; the replicas of your deployments will be placed across the cluster.

The http_host is what host the http server should bind to on each node (typically localhost or 0.0.0.0).

839576266 · March 19, 2022, 5:59am

I tried to bind the host in serve.start(), but it doesn’t seem to work, is there anything else I need to configure?

import time

import ray
from ray import serve

environment_dict = {
    "working_dir": "/home/hwd/kernel"
}

ray.init(address='ray://10.3.70.138:10001', runtime_env=environment_dict)

http_dict = {"host": "10.3.70.140", "port": 9009}
serve.start(http_options=http_dict)


@serve.deployment
def hello(request):
    name = request.query_params["name"]
    return f"Hello {name}!"


# Deploy model.
info = hello.options(num_replicas=3).deploy()
while True:
    time.sleep(5)

But script occurred exception:

Traceback (most recent call last):
  File "/tmp/pycharm_project_188/tests/hik_serving/test/ray_test.py", line 19, in <module>
    serve.start(http_options=http_dict)
  File "/usr/lib/fedai/hikflkernel/.venv/lib/python3.8/site-packages/ray/serve/api.py", line 465, in start
    ray.get(
  File "/usr/lib/fedai/hikflkernel/.venv/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return getattr(ray, func.__name__)(*args, **kwargs)
  File "/usr/lib/fedai/hikflkernel/.venv/lib/python3.8/site-packages/ray/util/client/api.py", line 42, in get
    return self.worker.get(vals, timeout=timeout)
  File "/usr/lib/fedai/hikflkernel/.venv/lib/python3.8/site-packages/ray/util/client/worker.py", line 359, in get
    res = self._get(to_get, op_timeout)
  File "/usr/lib/fedai/hikflkernel/.venv/lib/python3.8/site-packages/ray/util/client/worker.py", line 386, in _get
    raise err
types.RayTaskError(ValueError): ray::HTTPProxyActor.ready() (pid=27111, ip=10.3.70.138, repr=<ray.serve.http_proxy.HTTPProxyActor object at 0x7f8f8119be20>)
OSError: [Errno 99] Cannot assign requested address

During handling of the above exception, another exception occurred:

ray::HTTPProxyActor.ready() (pid=27111, ip=10.3.70.138, repr=<ray.serve.http_proxy.HTTPProxyActor object at 0x7f8f8119be20>)
  File "/usr/local/abm/python/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/abm/python/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/usr/lib/fedai/hikflkernel/.venv/lib/python3.8/site-packages/ray/serve/http_proxy.py", line 329, in ready
    return await done_set.pop()
  File "/usr/lib/fedai/hikflkernel/.venv/lib/python3.8/site-packages/ray/serve/http_proxy.py", line 348, in run
    raise ValueError(
ValueError: Failed to bind Ray Serve HTTP proxy to '10.3.70.140:9009'.
Please make sure your http-host and http-port are specified correctly.

eoakes · March 21, 2022, 4:44pm

@839576266 are you sure that 10.3.70.140 is the right address to bind to on the cluster? If you try localhost or 0.0.0.0 does it work?

839576266 · March 22, 2022, 11:16am

I have deployed a ray cluster where 10.3.70.138 is the head node and 10.3.70.140 is the work node. I am running this program on 10.3.70.138 and if I use localhost it will deploy a serve deployment on 10.3.70.138. But I want to deploy serve deployment on 10.3.70.140, I’m not sure if ray supports this

eoakes · March 22, 2022, 6:09pm

@839576266 the replicas of the deployment will run across the cluster, you don’t need to set the host for each node in the cluster individually.

839576266 · March 23, 2022, 2:23am

What does replicas in ray serve refer to? I completed my deployment through Hello.options(num_replicas=3).deploy(), but my port can only be monitored on the head node, other machines in the cluster do not find related processes

simon-mo · March 23, 2022, 5:30pm

The head node’s http proxy will load balance requests to each of the Hello replicas.

Topic		Replies	Views
Error when trying to get handle to Ray Serve deployment Ray Serve	2	1044	February 15, 2022
Ray Serve HTTP requests handling	6	991	April 6, 2023
How to connect to a remote serve? Ray Serve	1	1115	July 14, 2021
Starting Ray Serve from Python Ray Serve	3	1034	January 10, 2022
How to deploy Serve http service to cluster? Ray Serve	2	32	September 17, 2024

Can I deploy services to other machines in the cluster?

Related topics