Ray Serve with vs without FastAPI

New to Ray Serve (been using Ray/RLLib/Tune for a while) and have successfully followed the doc tutorials including the FastAPI example. My use case is straightforward model deploying/serving. My question is what benefits do I get from the FastAPI example vs “pure” Ray Serve? Why use one vs the other? Which, if any, is faster? Or is the FastAPI example there for folks that already have an extensive deployed FastAPI code base and now want to add Ray to the mix for the distributed model inferencing speed gains?

Thanks in advance for insights! Just starting to get my head around Ray Serve and how I could use it for my potential use cases/applications.

Really great question @akelloway :slight_smile:

The main benefit of using Ray Serve with FastAPI is you get the full flexibility of their (awesome) HTTP server. That means you can define pydantic types and FastAPI will automatically cast the incoming requests to them, you can easily define multiple routes and variable paths, you can use their dependency injection system for DB connections, auto-generated OpenAPI spec, etc. Ray Serve’s built-in HTTP server is a bit lower-level and doesn’t offer these convenience features right now so you will need to operate at a slightly lower level.

I would say a good rule of thumb is if you’re just doing model serving on Ray Serve, using the built-in server is probably “good enough” and will “serve” you well. If you want to build a fully-featured scalable web application, using FastAPI and scaling out the backends probably makes sense. Hope this helps!

Note that we’re currently cooking up a plan to provide the best of both worlds with “native” support. We’ll track it on this issue for future readers: [serve] use fastapi as backend? · Issue #9869 · ray-project/ray · GitHub.

Oh, another reason to use the FastAPI support (that may not be too relevant to you, but for others): if you already have an existing FastAPI server and want to scale it up using Ray Serve you can do that without changing the HTTP code.

@eoakes – thanks for the insightful answers - I now have a better understanding of the considerations for each approach – not sure which is “better” suited for my application right now, but I am in a better position to make that decision going forward. Thanks!