1. Severity of the issue: (select one)
High: Completely blocks me.
2. Environment:
- Ray version: 2.44.1
- Python version: 3.10.13
- OS: Ubuntu 22.04
- Cloud/Infrastructure:
- Other libs/tools (if relevant):
3. What happened vs. what you expected:
- Expected: To be able to scale and improve requests throughput
- Actual: can not get any gains by scaling with ray server over simple fastapi application
I am trying to scale my application by using fastapi and ray serve. I am following the documentation, however I can not get any gains by scaling with ray server over simple fastapi application.
My throughput with ray serve is the same as using fastapi (even a little bit worse). Scaling with NUM_REPLICAS
=2, does not improve throughput, when testing on my laptop. What am I missing here? I was expecting that using ray serve will improve throughput .
Code:
ray_server.py
@serve.deployment(
num_replicas=os.environ.get("NUM_REPLICAS", 2),
ray_actor_options={"num_cpus": os.environ.get("NUM_CPU", 1),
"num_gpus": os.environ.get("NUM_GPU", 0)})
@serve.ingress(app)
class Service:
def __init__(self):
self.ml = MLService()
async def predict(self, text: str):
return await self.ml.predict(text)
@app.post(
"/extract
)
async def extract(self,
content: str
):
response = await self.predict(text=content)
return ExtractionResult(mentions=response)
def deployment(_args):
return Service.bind()
Deploying with:
serve run server.ray_server:deployment