I know that ray integrates well with FastAPI. I have already done the
documentation examples and even some of my own, which I deployed both
locally and on Google Cloud Compute machines.
However, the question that I have at the moment is: How to include ray
serve on an already very large FastAPI web app that, up to this point
had no need to have models served.
We have a FastAPI app that implements several other features unrelated
with Machine Learning.
Why is it important to reuse this structure:
- Authentication goes here;
- Logging to the database;
- Other features.
The first option would be to simply serve the models in a separate
cluster and have the first app make the requests there. However, this
does not seem to be very elegant. We would have to have our first API
making requests to the second API.
The second option would be, to somehow extend out FastAPI app using
ray. However, I am unsure how this is done and documentation/tutorials
seem scarce on this topic.
I know that if we already have an app we can extend it by simply doing
app = FastAPI() @serve.ingress(app) class MyModelServer: def __init__(self, ...): pass app.post("/predict_something") async def predict_something(...): my_app = MyModelServer.bind()
However, I am under the impression that, in order for this to work, we
would have to deploy it using:
ray start --head serve build ray_serve:my_app -o serve_config.yaml serve deploy serve_config.yaml
This would cause the APP to be managed by ray inside a cluster. But I
am unsure what advantages/disadvantages this would entail when
compared our current workflow.
Moreover, our app is not so simple. The structure is more or less
. └── api/ └── app/ ├── routers/ │ ├── __init__.py │ ├── instances.py │ └── tasks.py └── main.py
Where, for example, in
tasks.py we have several routers:
router = APIRouter( prefix="/tasks", tags=["tasks"], ) @router.post("/some_task", status_code=fastapi.status.HTTP_202_ACCEPTED, response_model=schemas.TaskStatus) async def some_task(...): # all other functions
And then, in our
main.py we have:
app = FastAPI(title="MyAppTitle") app.include_router(tasks.router) # Other routers use_route_names_as_operation_ids(app) @app.on_event("startup") async def startup_event(): pass
How would we include ray serve in this scheme. Could it be something like:
. └── api/ └── app/ ├── routers/ │ ├── __init__.py │ ├── instances.py │ ├── ray_serve_tasks.py │ └── tasks.py └── main.py
ray_serve_tasks.py would have the routers for the machine
learning jobs. Something like:
# --DECORATOR-- class MyModelServer: def __init__(self, ...): pass app.post("/predict_something") async def predict_something(...):
--DECORATOR-- would have to be something other than
serve.ingress(app). What would it be, in this case? How would
deployment work? Can we have a ray cluster managing just the requests
to these classes? Or do we need to have the entire app inside a ray
Some other best practice that I am missing.