[Medium] Using docker image for service deployment

# Dockerfile for head
FROM python:3.10.13-slim

RUN apt-get update && apt-get install -y g++ gcc libsndfile1 git ffmpeg podman curl

RUN curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  tee /etc/apt/sources.list.d/nvidia-container-toolkit.list \
  && apt-get update
RUN apt-get install -y nvidia-container-toolkit

RUN python -m pip install -U pip==23.3.1
RUN python -m pip install ray[default,serve]==2.8.0

ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8

WORKDIR /root/ray/
COPY . .
ENTRYPOINT ["/root/ray/docker/entrypoint.sh"]

# entrypoint.sh
nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
RAY_prestart_worker_first_driver=0.0 ray start --head --dashboard-host=0.0.0.0 --block

# service.py

import asyncio
from io import BytesIO

import numpy as np
import torch
from fastapi import FastAPI
from fastapi.responses import Response
from PIL import Image
from ray import serve
from ray.serve.handle import DeploymentHandle

app = FastAPI()


@serve.deployment(num_replicas=1)
@serve.ingress(app)
class APIIngressOD:
    def __init__(self, object_detection_handle) -> None:
        self.handle: DeploymentHandle = object_detection_handle.options(
            use_new_handle_api=True,
        )

    @app.get(
        "/",
        responses={200: {"content": {"image/jpeg": {}}}},
        response_class=Response,
    )
    async def detect(self, image_url: str):
        image = await self.handle.detect.remote(image_url)
        file_stream = BytesIO()
        image.save(file_stream, "jpeg")
        return Response(content=file_stream.getvalue(), media_type="image/jpeg")


@serve.deployment(
    ray_actor_options={"num_gpus": 0.25},
    autoscaling_config={"min_replicas": 2, "max_replicas": 4, "downscale_delay_s": 60},
)
class ObjectDetection:
    def __init__(self):
        self.model = torch.hub.load("ultralytics/yolov5", "yolov5s")
        self.model.cuda()

    async def detect(self, image_url: str):
        loop = asyncio.get_running_loop()
        result_im = await loop.run_in_executor(None, self.model, image_url)
        return Image.fromarray(result_im.render()[0].astype(np.uint8))


app = APIIngressOD.bind(ObjectDetection.bind())
serve.run(app, name="object_detection", route_prefix="/detect")
# Dockerfile for <ray-service>
FROM python:3.10.13-slim

RUN python -m pip install -U pip==23.3.1

COPY . .
RUN python -m pip install Pillow \
    opencv-python \
    torchvision>=0.16.* \
    numpy \
    torch \
    pandas \
    ray[serve]==2.8.0

docker build -t <ray-service> .
docker push <ray-service>
RAY_ADDRESS='http://localhost:8265' ray job submit --runtime-env-json '{"container": {"image": "<ray-service>:latest", "run_options": ["--tty", "--privileged", "--cap-drop ALL", "--log-level=debug", "--device nvidia.com/gpu=all", "--security-opt=label=disable",  "--restart unless-stopped"]}, "config": {"eager_install": false}, "env_vars":{"NVIDIA_VISIBLE_DEVICES": "all"}}' -- python service.py

I have had a few problems using this command:

  1. I have to go into the main node container and pull the image beforehand. Otherwise, no matter how long I wait, the job remains uncompleted. Is this happening because of the timeout to complete the job? Is it possible to adjust this timeout?
  2. After sending a job the images try to pull infinitely many times (raylet.err). As if there is no limit in attempts. That is, if we have not managed to pull the image in N time, I expect the job to go to failed status. But it stays pending forever. Is it possible to configure killing jobs that failed to start?
  3. I also tried running my service on grpc with the image specified. Everything is fine. Requests go through on port 9000, but as soon as I deploy the service without image on port 8000 (specifying only dependencies via pip). The grpc service, which before I deployed another service was working, returns this response:
    status = StatusCode.NOT_FOUND details = "Application metadata not set. Please ping /ray.serve.RayServeAPIService/ListApplications for available applications."

I tried running the serve start --http-host 0.0.0.0 --grpc-port 9000 --grpc-servicer-functions <test_pb2_grpc.add_TestServicer_to_server> command before starting the services. Now I can’t get the second service up (after the grpc service specifying image):

RAY_ADDRESS='http://localhost:8265' ray job submit \
> --working-dir . \
> --runtime-env-json '{'pip': 'requirements.txt', 'config': {'eager_install': false}}' \
> -- python service.py

An error is returned:

runtime_env setup failed: Failed to set up runtime environment.
Could not create the actor because its associated runtime env failed to be created.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/asyncio/streams.py", line 501, in _wait_for_data
    await self._waiter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/ray/_private/runtime_env/utils.py", line 91, in check_output_cmd
    stdout, _ = await proc.communicate()
  File "/usr/local/lib/python3.10/asyncio/subprocess.py", line 195, in communicate
    stdin, stdout, stderr = await tasks.gather(stdin, stdout, stderr)
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/ray/_private/runtime_env/agent/runtime_env_agent.py", line 366, in _create_runtime_env_with_retry
    runtime_env_context = await asyncio.wait_for(
  File "/usr/local/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.10/site-packages/ray/_private/runtime_env/agent/runtime_env_agent.py", line 326, in _setup_runtime_env
    await create_for_plugin_if_needed(
  File "/usr/local/lib/python3.10/site-packages/ray/_private/runtime_env/plugin.py", line 254, in create_for_plugin_if_needed
    size_bytes = await plugin.create(uri, runtime_env, context, logger=logger)
  File "/usr/local/lib/python3.10/site-packages/ray/_private/runtime_env/pip.py", line 518, in create
    bytes = await task
  File "/usr/local/lib/python3.10/site-packages/ray/_private/runtime_env/pip.py", line 498, in _create_for_hash
    await PipProcessor(
  File "/usr/local/lib/python3.10/site-packages/ray/_private/runtime_env/pip.py", line 400, in _run
    await self._install_pip_packages(
  File "/usr/local/lib/python3.10/site-packages/ray/_private/runtime_env/pip.py", line 376, in _install_pip_packages
    await check_output_cmd(pip_install_cmd, logger=logger, cwd=cwd, env=pip_env)
  File "/usr/local/lib/python3.10/site-packages/ray/_private/runtime_env/utils.py", line 93, in check_output_cmd
    raise RuntimeError(f"Run cmd[{cmd_index}] got exception.") from e
RuntimeError: Run cmd[9] got exception.

Hi @psydok, I’m not sure about 1 and 2. But for 3 here simply meant you have multiple application deployed in Serve and that Serve’s gRPC doesn’t know which one to send to. You can pass an “application” metadata in your client. For an example do it see: Set Up a gRPC Service — Ray 2.8.1

Hi @psydok, some fixes and improvements were made to this runtime environment feature as part of Ray 2.9. Could you try it out and let me know if it fixes your issues? The full docs are here: Run Multiple Applications in Different Containers — Ray 2.9.0. Note that this is still an experimental feature, so if you have feature requests or run into issues, please submit an issue on Github!

I seem to have managed to solve these problems in 2.8.1 by setting RAY_worker_register_timeout_seconds=1200.

Thank you, @Gene . I added specifying the application name to the metadata and it worked.

I am now trying to set up a cluster via docker environment on different servers. But it seems like specifying node-ip-address breaks everything, starting with the fact that I can’t view the node logs on the dashboardard.

Runtime Env Agent timed out as NotFound in 30000ms. Status: NotFound: on_connect Connection refused, address: x.x.x.x, port: 19124, Suiciding...
or

> docker compose exec node serve start --proxy-location EveryNode \
        --http-host 0.0.0.0 --http-port 8099

2023-12-26 10:34:06,574 INFO worker.py:1540 -- Connecting to existing Ray cluster at address: HEAD_IP:6379...
2023-12-26 10:34:06,591 INFO worker.py:1715 -- Connected to Ray cluster. View the dashboard at http://HEAD_IP:8265 
[2023-12-26 10:34:15,600 E 252 282] core_worker_process.cc:216: Failed to get the system config from raylet because it is dead. Worker will terminate. Status: GrpcUnavailable: RPC Error message: failed to connect to all addresses; last error: UNKNOWN: ipv4:CURRENT_EXTERNAL_IP_OF_NODE:35369: Failed to connect to remote host: Connection refused; RPC Error details:  .Please see `raylet.out` for more details.

No, it’s become more of a problem.

The only thing that helped me was to put firewalld on top of iptables on the server and restart docker. Before that it was just iptables. But I find this solution strange. I don’t understand why it worked.
If you have any suggestions, could you please share?