Unable to get started with Ray Serve + FastAPI

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it (will prototype without Ray Serve).

Hi! I’m struggling to get started with Ray Serve for serving my pytorch models. I’m trying to achieve the following:

  • I have a FastAPI server that serves user requests and interacts with a database to support the required business logic.
  • Whenever users hit the /api/test endpoint, it should perform inference using pytorch model, apply some business logic on the result, and respond to the client.
  • I would like to retain the other routes and auth middleware already active on the FastAPI server.

I read the following resources:

  • docs here
  • Ray: Serve ML models docs (Link redacted because I’m a new user and can’t post more than 2 links)
  • And found a few interesting examples online (e.g. this one)

I tried to start with a simple test, and used the following code from the docs adding it to a file called test.py:

import ray
import requests
from fastapi import FastAPI
from ray import serve

app = FastAPI()
# NOTE: This is my custom addition to serve some static pages.
app.mount("/static", StaticFiles(directory="static"), name="static")

class MyFastAPIDeployment:
    def root(self):
        return "Hello, world!"

    def root(self, subpath: str):
        return f"Hello from {subpath}!"

resp = requests.post("http://localhost:8000/hello/Serve")
assert resp.json() == "Hello from Serve!"

When running python test.py, I see the following:

2023-01-06 18:15:14,000 INFO worker.py:1529 – Started a local Ray instance. View the dashboard at
(ServeController pid=19356) INFO 2023-01-06 18:15:15,660 controller 19356 http_state.py:129 - Starting HTTP proxy with name ‘SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-a90809961c7f36842d67eba6574e31d92c448022b3263c841b3b1b11’ on node ‘a90809961c7f36842d67eba6574e31d92c448022b3263c841b3b1b11’ listening on ‘’
(HTTPProxyActor pid=19944) INFO: Started server process [19944]
(ServeController pid=19356) INFO 2023-01-06 18:15:16,320 controller 19356 deployment_state.py:1310 - Adding 1 replica to deployment ‘MyFastAPIDeployment’.
(HTTPProxyActor pid=19944) INFO 2023-01-06 18:15:19,354 http_proxy http_proxy.py:361 - POST /hello 200 8.0ms
(ServeReplica:MyFastAPIDeployment pid=1336) INFO 2023-01-06 18:15:19,352 MyFastAPIDeployment MyFastAPIDeployment#AFMcPK replica.py:505 - HANDLE call OK 4.0ms

But the server seems to die right after this gets executed, e.g. it does not live beyond the last assert. I tried to use serve run test:MyFastAPIDeployment as well, but with no luck.

My expectation would be that, with some command, a FastAPI server would spin up and stay up, serving my requests. Is my mental model wrong? Is there any working example of a FastAPI server serving a model with some minimal/toy business logic and non model endpoints?

Thank you and sorry if this is a very basic/dumb question.

I was able to find what I was doing wrong: I needed to remove serve.run(MyFastAPIDeployment.bind()) , have main = MyFastAPIDeployment.bind()and then do serve run test:main.

1 Like