Deploying ray serve on Kubernetes


I have a running Ray cluster on a Kubernetes cluster, starting a client works, but I have a strange issue when creating backend (example from documentation - Key Concepts — Ray v1.1.0) :

>>> client.create_backend("simple_backend_class", RequestHandler, "hello, world!")

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/", line 31, in check
    return f(self, *args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/", line 295, in create_backend
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/", line 1379, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RayServeException): ray::ServeController.create_backend() (pid=75, ip=
  File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 412, in ray._raylet.execute_task.function_executor
  File "python/ray/_raylet.pyx", line 1501, in ray._raylet.CoreWorker.run_async_func_in_event_loop
  File "/home/ray/anaconda3/lib/python3.7/concurrent/futures/", line 428, in result
    return self.__get_result()
  File "/home/ray/anaconda3/lib/python3.7/concurrent/futures/", line 384, in __get_result
    raise self._exception
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/", line 836, in create_backend
    raise e
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/", line 833, in create_backend
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/", line 282, in _scale_backend_replicas
    num_possible, current_num_replicas + num_possible))
ray.serve.exceptions.RayServeException: Cannot scale backend simple_backend_class to 1 replicas. Ray Serve tried to add 1 replicas but the resources only allows 0 to be added. To fix this, consider scaling to replica to 0 or add more resources to the cluster. You can check avaiable resources with ray.nodes().

I connected to Ray cluster like this:

if __name__ == "__main__":

    if ("RAY_HEAD_SERVICE_HOST" not in os.environ
                or os.environ["RAY_HEAD_SERVICE_HOST"] == ""):
            raise ValueError("RAY_HEAD_SERVICE_HOST environment variable empty."
                             "Is there a ray cluster running?")
    redis_host = os.environ["RAY_HEAD_SERVICE_HOST"]
    ray.init(address=redis_host + ":6379")
    #backend_config = serve.BackendConfig(num_replicas=1)
    #client = serve.start(detached=True, http_host="")
    client = serve.connect()

I saw this post, but didn’t find any more info of what could it possibly mean:

Kind regards,

cc @Dmitri @simon-mo

ray.init should be ran to connect to local raylet address instead of the redis address. where is the script running? the head pod, a worker pod, or a job?

It’s running on the head pod. redis_host is actually head node address.
Anyways, in the meantime I transferred to Ray 1.2.0, and everything works like a charm for now,
although there were problems with firewall at first (which ports to open).
In the end, to connect manually to Ray Cluster these commands were used in Ray 1.2.0, Python 3.6.8:

ray start --head --port=6379 --redis-shard-ports=6380,6381 --object-manager-port=2384 --gcs-server-port=45451

ray start --address='<head_node_ip>:6379' --redis-password='5241590000000000' --object-manager-port=2384


Just encountered the same error when connecting to the local raylet (i.e. not passing any address explicitly). The instance only has 2 cores, would that be a problem?

Hi, did you try maybe increasing --num_cpus argument (or maybe number of replicas in backend also), like this:
ray start --head --num_cpus=6 --port=6379

I had the same problem on a machine that has 2 CPUs, but on a machine with 8 CPUs it worked always with default settings, because num_cpus is by default set to number of cpus of the machine.
Also to confirm that I tried creating backend (client.create_backend()) on a machine with 8 CPUs, and then checking cluster resources (ray.cluster_resources() and ray.available_resources()), and it would always use 3 CPUs.
If seems as num_cpus is just number of replicas, because I tried putting 500 and it also worked.