Ray Serve LLM application

Hi,

I want to apply ray.serve for my llm model.

It could be basic problem. But I stuck with this problem.

my app2.py:

import requests
from ray import serve
import starlette

@serve.deployment(route_prefix="/forecast")
class Ray_llm:
    async def __call__(self, request: starlette.requests.Request):
        if "file" not in requests.files:
            return {"error": "No file part in the request"}, 400

        file = requests.files["file"]

        if file.filename == "":
            return {"error": "No selected file"}, 400

        query_text = requests.form.get("query", None)
        if not query_text:
            return {"error": "No query text provided"}, 400

        if file and query_text:

            response = send_to_llm(file, query_text)
            return response

def send_to_llm(file, querry_text):

    response = llm_caller(file, querry_text)  # Send to model

    return response

app = Ray_llm.bind()
serve.run(app, port=8081)

Dockerfile:

FROM python:3.8-slim

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt
RUN pip install "ray[serve]"

EXPOSE 8081

COPY . .

CMD ["python", "app2.py"]

BuildAndRun.sh :

docker build -t llm_api .

docker run -p 8081:5000 llm_api

when I run BuildAndRun.sh my service does not alive to post request. How can I solve this problem?

ss: