Hi,
I want to apply ray.serve for my llm model.
It could be basic problem. But I stuck with this problem.
my app2.py:
import requests
from ray import serve
import starlette
@serve.deployment(route_prefix="/forecast")
class Ray_llm:
async def __call__(self, request: starlette.requests.Request):
if "file" not in requests.files:
return {"error": "No file part in the request"}, 400
file = requests.files["file"]
if file.filename == "":
return {"error": "No selected file"}, 400
query_text = requests.form.get("query", None)
if not query_text:
return {"error": "No query text provided"}, 400
if file and query_text:
response = send_to_llm(file, query_text)
return response
def send_to_llm(file, querry_text):
response = llm_caller(file, querry_text) # Send to model
return response
app = Ray_llm.bind()
serve.run(app, port=8081)
Dockerfile:
FROM python:3.8-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
RUN pip install "ray[serve]"
EXPOSE 8081
COPY . .
CMD ["python", "app2.py"]
BuildAndRun.sh :
docker build -t llm_api .
docker run -p 8081:5000 llm_api
when I run BuildAndRun.sh my service does not alive to post request. How can I solve this problem?
ss: