Using Ray to build web apps

Is there some documentation how to use Ray to build web apps? I have found documentation on serving models, but I wonder about regular web apps? Is database even necessary with Ray? Or would state simply be stored as an actor? How to achieve permanence of the state?

1 Like

Hi @mitar, great question. Ray is a general distributed computing framework, so there are really no limits to what you might build with it.

As for general web apps, this is definitely possible, but you might want to elaborate what you would be using Ray for, as there are plenty of tools in web dev already and in some cases there might be better alternatives than implementing a solution yourself. We developed Ray Serve because there was no great way to serve ML models at scale, but for tools like databases there definitely is a lot of software already.

In any case, you should treat your Ray actors as ephemeral, i.e. you may be able to keep cached state in them, but definitely do store data in a database or somewhere else in memory. While there definitely are long running Ray jobs, Ray was not designed for persistent storage.

1 Like

I was mostly curious if there is something like that already out there to look at. I do not have any concrete needs for now from my side. But more like I wanted to see how will my mind will be blown away seeing how that is done or could be done. :slight_smile:

@eoakes has a demo for that! https://www.youtube.com/watch?v=8GTd8Y_JGTQ&t=11s

Ray talking to databases, scaling and serving ML models, online training, and using Actor for state storage. It can definitely be done :wink:

1 Like

Hey Mitar,

I would recommend using fastAPI as the API layer and then farm out computationally expensive tasks to Ray tasks (or use Ray actors for stateful services in the web app). This is also what we do to build the Anyscale service on top of Ray! For the database access, Iā€™d recommend asyncpg or SQLAlchemy.

Here is an example to get you started with Ray and fastAPI:

from fastapi import FastAPI
import ray
import sympy
 
app = FastAPI(title="Web backend", openapi_url="/openapi.json", docs_url="/docs")
 
@app.on_event("startup")
def startup_event():
   ray.init(num_cpus=1)
 
@ray.remote
def integration(expression: str):
   x = sympy.Symbol("x")
   return str(sympy.integrate(expression, x))
 
@app.get('/integrate')
async def integrate(expression: str):
   return {"integral": await integration.remote(expression)}

Then start the app with

uvicorn backend:app --port 8080

and go to http://localhost:8080/docs in your browser to query the endpoint.

You can then for example build the frontend in React or any other Javascript library you want.

Best,
Philipp.

1 Like

So one thing I was hoping to achieve with Ray is that I would not have two program paths for short and long living requests. So in current web programming models you or quickly respond directly in request handler, or you queue a backend job which does the work in some worker. What has bitten me in the past is that it is hard to know always which requests are long or short, because it can depend on particular request parameters. So I understand that I could use Ray for the ā€œbackend job running on a workerā€ scenario. But I would be curious to see if Ray has low enough latency that one could simply process all requests in Ray, with Ray making sure request handler workers autoscale as needed (and as load requires).

1 Like

Iā€™m interested on the same thing, however Iā€™m kinda dubious about the actual value such a set-up (FastAPI-ray) would actually give, maybe Iā€™m wrong, but let me explain what came to my mind.
FastAPI gets generally deployed via uvicorn workers on gunicor. The number of workers is, from what I know, the number of processes running, which is generally sett to the number of CPU cores or twice this number. This is what seems to give the best performance. Each worker will be used to run the async python functions, which are actually just plain python async functions, which make use of the standard python asyncio API. If we add a ray remote function to the program, this function will just run in another process. The machine resources will be the same, so effectively we are just doing some concurrent work here, which does not give, I think, much advantage. WHat I mean is that when we run a remote function, whatever happens inside that function will not release the GIL and the whole function will run in a specific process. If we, for example, think to Asynchronize a call to a database with mongoengine r any other synchronous driver, we will not actually make that function release the gil for asynchronous io. Effectively, if we use just one machine, so a local ray, we are just asynchronizing functions which run on the same cores, which are already scaled up by the uvicorn workers. If we spread the ray cluster over several machines instead, we could get better performance, just cause we add cores, but I would say, in this case, we should have one fastAPI instance and then lot of stuff running in teh ray cluster. Iā€™m not sure if all this makes sense, maybe Iā€™m missing something, but this is what seems making sense to me.

1 Like

Actually Iā€™ve just watched the interesting video about FastAPI-ray architecture at anyscale. In that case, however, Fastapi is used as API gateway to serve the business logic built with a set of service running on ray. This is very interesting, but it makes me point out two two things:

  1. This is a whole app made of a set of services, it is not a single microservice made with fastapi and ray, it could be, though.
  2. As I see several services implementing the business logic, each service seems like a whole program, which means you are effectively running multiple ray programs/scripts in the same cluster, which makes me think that, regarding a question I asked earlier, it is actually possible to run separate services in the same cluster, if they are meant to communicate with each other
1 Like