I have been trying to build a Docker container for a Ray Serve server using the following Dockerfile:
FROM ubuntu:latest
# Install Python and other necessary packages
RUN apt-get update && \
apt-get install -y python3-pip python3-dev build-essential
# Install FastAPI and Uvicorn
COPY requirements.txt /app/requirements.txt
RUN pip3 install --upgrade pip
RUN pip3 install -r /app/requirements.txt
# Copy the application code
COPY ./app /app
# COPY ./models /models
WORKDIR /app
# Expose the ports FastAPI will run on
EXPOSE 8265
EXPOSE 8001
# Command to run the Ray Serve
CMD ["python3", "main.py"]
The main.py
is below:
import ray
from ray import serve
from fastapi import FastAPI
from starlette.requests import Request
from starlette.responses import JSONResponse
from llama_index.llms.llama_cpp import LlamaCPP
import subprocess, uvicorn
subprocess.run(["ray", "start", "--head", "--node-ip-address", "0.0.0.0" ,"--port", "8001"])
app = FastAPI()
# Initialize Ray
ray.init(address="auto", namespace="llama")
serve.shutdown()
# serve.shutdown()
serve.start(detached=True)
@serve.deployment()
@serve.ingress(app)
class LlamaModelDeployment:
def __init__(self, model_path):
# Load the model (adjust the model path as necessary)
self.model = LlamaCPP(
model_path=model_path,
temperature=0.2,
max_new_tokens=1024,
context_window=2048,
model_kwargs={"n_gpu_layers": 0},
verbose=True,
)
@app.get("/hello")
async def root(self):
return JSONResponse({"response": "Hello World"})
@app.post("/llama")
async def root(self, request: Request):
data = await request.json()
# Process the request and generate a response using the LlamaModel instance
input_text = data.get("input", "")
output = self.model.complete(input_text)
return JSONResponse({"response": output.dict()})
if __name__ == "__main__":
# Deploy the model
model_path = "/models/phi-2.Q2_K.gguf"
serve.run(LlamaModelDeployment.bind(model_path), route_prefix="/")
# test server
import requests
resp = requests.get("http://0.0.0.0:8001/hello")
print(resp)
# don't let ray docker sleep
import time
while True:
time.sleep(10)
The Docker Compose file is:
services:
llamaserve:
build:
context: ./backend
dockerfile: Dockerfile
container_name: llamaserve_app
volumes:
- /path/to/models:/models
ports:
- "8265:8265"
- "8001:8001"
I keep getting ‘Connection aborted.’, BadStatusLine errors. Running main.py
outside the Docker container seems to work fine. So far, I have tried different base images (including ones provided by Ray), changing Docker configurations, reverting back to using ray serve
instead of ingress
, but nothing is working.
Any help will be appreciated!
How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.