[Serve] Is it possible to serve a model without running a cluster

My use case is following:

I have a ray cluster which users use to train/validate/tune models and run some custom user defined logic against those models or their combinations (defined with deployment graph), the models are served using serve, all of this happens on top of a single ray cluster. As a result users pick a single model they want to use further.

Now I want to be able to let users run their UDF + selected model/model graph on top of any other platform, not just ray (e.g. in a simple python process). Is it possible to do?

AFAIK serve.run() spins up ray actors under the hood, is it possible to just run it as a separate process/thread/async loop? So the users don’t have dependency on running ray cluster in their apps?

Bumping this, any help?