New to Ray Serve (been using Ray/RLLib/Tune for a while) and have successfully followed the doc tutorials including the FastAPI example. My use case is straightforward model deploying/serving. My question is what benefits do I get from the FastAPI example vs “pure” Ray Serve? Why use one vs the other? Which, if any, is faster? Or is the FastAPI example there for folks that already have an extensive deployed FastAPI code base and now want to add Ray to the mix for the distributed model inferencing speed gains?
Thanks in advance for insights! Just starting to get my head around Ray Serve and how I could use it for my potential use cases/applications.