Best Practices for expanding FastAPI app

Hi everyone,

I know that ray integrates well with FastAPI. I have already done the
documentation examples and even some of my own, which I deployed both
locally and on Google Cloud Compute machines.

However, the question that I have at the moment is: How to include ray
serve on an already very large FastAPI web app that, up to this point
had no need to have models served.

More background context.

We have a FastAPI app that implements several other features unrelated
with Machine Learning.

Why is it important to reuse this structure:

  • Authentication goes here;
  • Logging to the database;
  • Other features.

Option 1

The first option would be to simply serve the models in a separate
cluster and have the first app make the requests there. However, this
does not seem to be very elegant. We would have to have our first API
making requests to the second API.

Option 2

The second option would be, to somehow extend out FastAPI app using
ray. However, I am unsure how this is done and documentation/tutorials
seem scarce on this topic.

I know that if we already have an app we can extend it by simply doing
something like:

app = FastAPI()

@serve.ingress(app)
class MyModelServer:
    def __init__(self, ...):
   pass
   
app.post("/predict_something")
async def predict_something(...):


my_app = MyModelServer.bind()

However, I am under the impression that, in order for this to work, we
would have to deploy it using:

ray start --head
serve build ray_serve:my_app -o serve_config.yaml
serve deploy serve_config.yaml

This would cause the APP to be managed by ray inside a cluster. But I
am unsure what advantages/disadvantages this would entail when
compared our current workflow.

Moreover, our app is not so simple. The structure is more or less
something like:

.
└── api/
    └── app/
        ├── routers/
        │   ├── __init__.py
        │   ├── instances.py
        │   └── tasks.py
        └── main.py

Where, for example, in tasks.py we have several routers:

router = APIRouter(
    prefix="/tasks",
    tags=["tasks"],
)


@router.post("/some_task",
             status_code=fastapi.status.HTTP_202_ACCEPTED,
             response_model=schemas.TaskStatus)
async def some_task(...):


# all other functions

And then, in our main.py we have:

app = FastAPI(title="MyAppTitle")

app.include_router(tasks.router)
# Other routers

use_route_names_as_operation_ids(app)

@app.on_event("startup")
async def startup_event():
    pass

How would we include ray serve in this scheme. Could it be something like:

.
└── api/
    └── app/
        ├── routers/
        │   ├── __init__.py
        │   ├── instances.py
        │   ├── ray_serve_tasks.py
        │   └── tasks.py
        └── main.py

Where ray_serve_tasks.py would have the routers for the machine
learning jobs. Something like:

# --DECORATOR--
class MyModelServer:
    def __init__(self, ...):
   pass
   
app.post("/predict_something")
async def predict_something(...):

but the --DECORATOR-- would have to be something other than
serve.ingress(app). What would it be, in this case? How would
deployment work? Can we have a ray cluster managing just the requests
to these classes? Or do we need to have the entire app inside a ray
cluster?

Option 3

Some other best practice that I am missing.

Option 1 is valid, but I agree it would be a bit inelegant.

As for option 2, could you take a look at this unit test? We test many FastAPI features with Serve, so perhaps one of those patterns could work well with your FastAPI setup.