How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hi, I am trying out Ray Serve for a deployment scenario I have which is to make a prediction on each sentence within a paragraph. The expected output based on an input paragraph should be the following:
[
{
"sentence": "",
"news_type": "hoax",
"score": 0.33035558462142944
},
{
"sentence": "",
"news_type": "hoax",
"score": 0.4125480353832245
}
]
I have 3 internal deployments which are all async, that are to be used within the actual prediction endpoint:
@serve.deployment(version="v0")
class AutoTokenizerDeployment:
def __init__(self, model_name: str) -> None:
...
async def __call__(self, sentence: str):
# return tokenized sentence
@serve.deployment(version="v0")
class SplitSentencesDeployment:
async def __call__(self, text: List[str]) -> List[str]:
# split sentences
@serve.deployment(num_replicas=4, version="v0")
class ModelDeployment:
def __init__(self, model_path: str) -> None:
...
async def __call__(self, preprocessed) -> Tuple[float, Category]:
# return prediction based on tokenized sentence
Question:
In the actual /predict
endpoint, I get the handles of the previous deployments to make inferences. After referring to the documentation and a couple of examples such as this one, I have noticed that I will need to call await (await remote())
on each loop multiple times to achieve the desired output. Is this the correct way to do so? Or is there a better way?
@serve.deployment(route_prefix="/")
@serve.ingress(app)
class NLPClassifierComposite:
def __init__(self) -> None:
self.tokenizer = AutoTokenizerDeployment.get_handle(sync=False)
self.sentence_splitter = SplitSentencesDeployment.get_handle(sync=False)
self.model = ModelDeployment.get_handle(sync=False)
@app.post(
"/predict",
# response_model=List[CompositeModelOutput],
# throws an error due to bug in Ray
status_code=status.HTTP_200_OK,
)
async def predict(self, payload: ModelInput):
text = payload.text
sentences = await (await self.sentence_splitter.remote(text))
results = []
for sentence in sentences:
tokenized_sentence = await (await self.tokenizer.remote(sentence))
probability, class_name = await (
await self.model.remote(tokenized_sentence)
)
results.append(
{
"sentence": sentence,
"news_type": class_name,
"score": float(probability),
}
)
return results