Ray with FastAPI

Hello everyone

I am new to Ray.

I have followed the steps to configure a Ray cluster on AWS.

My challenge now is I have an api built using Fastapi. an endpoint from the api collects request from user to create inference by a machine learning model.

I want to use Ray for distributing workloads for this ML model on multiple nodes. This will increase speed of creating inference for large input data.

I want to be able to deploy the Fastapi app (along with the ML model service) somewhere, and when request come in for inference:

  1. In the Fastapi app, I initialize Ray with ray.init and connect to the remote ray cluster.
  2. Call the function that creates the inference using Ray’s .remote()
  3. The Ray remote cluster runs the workload in distributed fashion using the head node and worker nodes created on AWS.
  4. Fastapi awaits the futures using Ray’s .get(), once ready, Ray sends the result back and Fastapi respond to the user.

Is this possible with Ray and Fastapi?

Have you considered Ray Serve: Ray Serve: Scalable and Programmable Serving — Ray 2.9.0 which has Fastapi integration.