I have an existing backend FastAPI server handling business logic. How can I extend this beckend to add a simple machine learning model that inference a POST request?
I have read Have read Ray with FastAPI - Ray Core - Ray and FastAPI + Ray Core vs FastAPI + Ray Serve? - Ray Serve - Ray but there’s no solution.
From Ray Serve docs, using @serve.deployment seems to mean placing my backend in Ray cluster, with entrypoint serve run xx:xx. What’s the advantage of this approach?
Another approach that makes more sense to me is to keep the original backend and extend the computation part with Ray, with serve.deployment DeploymentHandle API. How can I achieve this?
This is the Actor running inference code:
@serve.deployment()
class KeywordExtractor:
def __init__(self):
self.model = KeyBERT()
def keyword_extract(self, doc:str):
model_output = self.model.extract_keywords(doc, keyphrase_ngram_range=(1, 1), stop_words=None)
return model_output
keyword_extractor = KeywordExtractor.bind()
And I’m looking to call it from an endpoint.
app = FastAPI()
@app.post("/extract_keywords")
async def extract_keywords(doc:Doc):
if not doc.text:
raise HTTPException(status_code=400, detail={"error":"No text provided","hint": "Please include a non-empty 'text' field in the request body."})
keywords_sim = await keyword_extractor.keyword_extract.remote(doc.text)
keywords_sim = ray.get(keywords_sim)
keywords = [kw[0] for kw in keywords_sim]
return {
"keywords": keywords
}