- High: It blocks me to complete my task.
Hey folks, just getting started with Ray Serve and we’d like to deploy a stand-alone Ray Serve in Docker on k8s without Ray Cluster for now. We’re managing autoscaling and such on our own at the moment so it’s enough just have single nodes running without a full Ray Cluster.
I’m running very early into some difficulty just getting a single Ray Serve to run in the Docker container though. Everything I try seems to run into the same problem which appears to be related to ports/sockets in Docker, but I’m uncertain. Here’s some snippets of what we have started with, which works fine outside of Docker, and the errors we see in various logs. Any help is appreciated!
main.py
:
# [snip]
api = FastAPI()
@serve.deployment(num_replicas=1)
@serve.ingress(api)
class APIIngress:
def __init__(self, model_handle: DeploymentHandle) -> None:
self.model_handle = model_handle
@api.get("/health")
async def health(self):
return {"status": "ok"}
@api.post("/embedding/")
async def embedding(self, req: TextEmbeddingRequest):
# [snip]
...
entrypoint = APIIngress.bind(model)
Dockerfile
:
FROM rayproject/ray:2.23.0-py311-cpu
[snip]
ENTRYPOINT ["poetry", "run", "serve", "run", "main:entrypoint"]
pyproject.toml
python = "~3.11"
fastapi = "~0.111"
ray = {version = "~2.23", extras = ["serve"]}
Logs show the following. All other logs are empty or don’t contain relevant error output (as far as I can see). For example {dashboard_agent|runtime_env_agent}.log
don’t exist.
In python-core-driver-01000000ffffffffffffffffffffffffffffffffffffffffffffffff_18.log
:
[2024-06-03 07:26:00,339 I 18 18] core_worker_process.cc:107: Constructing CoreWorkerProcess. pid: 18
[2024-06-03 07:26:00,355 I 18 18] io_service_pool.cc:35: IOServicePool is running with 1 io_service.
[2024-06-03 07:26:00,643 W 18 18] raylet_client.cc:88: The connection is failed because the local raylet has been dead. Terminate the process. Status: IOError: No such file or directory
In raylet.err
:
[2024-06-03 07:26:00,367 E 144 203] (raylet) agent_manager.cc:83: The raylet exited immediately because one Ray agent failed, agent_name = dashboard_agent/424238335.
The raylet fate shares with the agent. This can happen because
- The version of `grpcio` doesn't follow Ray's requirement. Agent can segfault with the incorrect `grpcio` version. Check the grpcio version `pip freeze | grep grpcio`.
- The agent failed to start because of unexpected error or port conflict. Read the log `cat /tmp/ray/session_latest/logs/{dashboard_agent|runtime_env_agent}.log`. You can find the log file structure here https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#logging-directory-structure.
- The agent is killed by the OS (e.g., out of memory).
[2024-06-03 07:26:00,367 E 144 205] (raylet) agent_manager.cc:83: The raylet exited immediately because one Ray agent failed, agent_name = runtime_env_agent.
The raylet fate shares with the agent. This can happen because
- The version of `grpcio` doesn't follow Ray's requirement. Agent can segfault with the incorrect `grpcio` version. Check the grpcio version `pip freeze | grep grpcio`.
- The agent failed to start because of unexpected error or port conflict. Read the log `cat /tmp/ray/session_latest/logs/{dashboard_agent|runtime_env_agent}.log`. You can find the log file structure here https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#logging-directory-structure.
- The agent is killed by the OS (e.g., out of memory).