I have been trying to build a Docker container for a Ray Serve server using the following Dockerfile:
FROM ubuntu:latest
# Install Python and other necessary packages
RUN apt-get update && \
apt-get install -y python3-pip python3-dev build-essential
# Install FastAPI and Uvicorn
COPY requirements.txt /app/requirements.txt
RUN pip3 install --upgrade pip
RUN pip3 install -r /app/requirements.txt
# Copy the application code
COPY ./app /app
# COPY ./models /models
WORKDIR /app
# Expose the ports FastAPI will run on
EXPOSE 8265
EXPOSE 8001
# Command to run the Ray Serve
CMD ["python3", "main.py"]
The main.py is below:
import ray
from ray import serve
from fastapi import FastAPI
from starlette.requests import Request
from starlette.responses import JSONResponse
from llama_index.llms.llama_cpp import LlamaCPP
import subprocess, uvicorn
subprocess.run(["ray", "start", "--head", "--node-ip-address", "0.0.0.0" ,"--port", "8001"])
app = FastAPI()
# Initialize Ray
ray.init(address="auto", namespace="llama")
serve.shutdown()
# serve.shutdown()
serve.start(detached=True)
@serve.deployment()
@serve.ingress(app)
class LlamaModelDeployment:
def __init__(self, model_path):
# Load the model (adjust the model path as necessary)
self.model = LlamaCPP(
model_path=model_path,
temperature=0.2,
max_new_tokens=1024,
context_window=2048,
model_kwargs={"n_gpu_layers": 0},
verbose=True,
)
@app.get("/hello")
async def root(self):
return JSONResponse({"response": "Hello World"})
@app.post("/llama")
async def root(self, request: Request):
data = await request.json()
# Process the request and generate a response using the LlamaModel instance
input_text = data.get("input", "")
output = self.model.complete(input_text)
return JSONResponse({"response": output.dict()})
if __name__ == "__main__":
# Deploy the model
model_path = "/models/phi-2.Q2_K.gguf"
serve.run(LlamaModelDeployment.bind(model_path), route_prefix="/")
# test server
import requests
resp = requests.get("http://0.0.0.0:8001/hello")
print(resp)
# don't let ray docker sleep
import time
while True:
time.sleep(10)
The Docker Compose file is:
services:
llamaserve:
build:
context: ./backend
dockerfile: Dockerfile
container_name: llamaserve_app
volumes:
- /path/to/models:/models
ports:
- "8265:8265"
- "8001:8001"
I keep getting ‘Connection aborted.’, BadStatusLine errors. Running main.py outside the Docker container seems to work fine. So far, I have tried different base images (including ones provided by Ray), changing Docker configurations, reverting back to using ray serve instead of ingress, but nothing is working.
Any help will be appreciated!
How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.