Ray serve with dynamic deployments

jdz · September 23, 2022, 8:20am

Hi,
We have a few questions related to the serve component.
We have a Ray cluster deployed in our K8s cluster using Helm. We use it to automatically deploy models created by our clients, and run predictions in real-time. The ray client (we call it the supervisor) is doing the following:

Deploy an api to add / remove models (which are then pulled from MLflow).
Consume data from a pipeline and call the models.

This is mainly working well, but we have some weird behaviors. Maybe we’re doing something wrong. Some may be related to the new update (2.0.0).

We have trouble understanding how/where the deployments are placed. Multiple versions of our supervisor can run simultaneously (locally, and in dev/staging/prod). The first supervisor instance works correctly and deploys the API deployment (code shown below). But the second instance will replace the API deployment with its own. The logs say:
```
INFO 2022-09-22 01:24:36,316 controller 120 deployment_state.py:1232 - Adding 1 replicas to deployment 'dev_api'.
INFO 2022-09-22 01:25:33,853 controller 120 deployment_state.py:1232 - Adding 1 replicas to deployment 'local_api'.
INFO 2022-09-22 01:25:38,212 controller 120 deployment_state.py:1257 - Removing 1 replicas from deployment 'dev_api'.
```
Why is that? We want to keep both deployments running at the same time; we gave them different names and prefixes. Same for the model deployments.
Maybe linked to the previous question. Sometimes, when a supervisor instance and its deployment is already running, the second cannot start because of an error in the API constructor (see below). The get_actor method does not find the actor created earlier, because it’s not in the same namespace (it’s clearer in the code). Is the deployment started in another namespace? Why?
A (dirty?) fix is giving the root namespace to the constructor, but there’s certainly another way.

Other remarks:

We hope that you’ll still support the standard serve api (with .deploy() and serve.get_deployment(name).get_handle()) . The new one with composable and serve.run does not fit our workflow well.
It should be made more clear in the docs that serve must be launched with http_options={"host": "0.0.0.0"}. We lost quite some time understanding why the serve port was not reachable from another pod in the K8s cluster.
It looks like the autoscalerOptions is not configurable from the Helm values. Is it intended?

Thank you for your help and clarifications. I hope it’s not too much in the same post. I’ll be happy to add details if needed.

Config:
Kuberay 0.3.0 - deployed in a K8s cluster using Helm
Ray client 2.0.0 - deployed in K8s and locally
The supervisor starts with the following instructions:

ray.init(RAY_URL)
serve.start(detached=True, http_options={"host": "0.0.0.0"})

The API looks like this:

app = FastAPI()
@serve.deployment(name=APP_NAME__API_NAME, route_prefix=APP_NAME__API_PREFIX)
@serve.ingress(app)
class API:
    def __init__(self, root_namespace):
        print(ray.get_runtime_context().namespace, root_namespace)  # Those are different
        self.controller = ray.get_actor("controller", namespace=root_namespace)

    @app.post("/processors")
    def add_processor(self, processor: Processor):
        ...  # Deploy a processor which will create and apply a model

And the model deployments:

@serve.deployment(route_prefix=None)
class ModelDeployment:
    def __init__(self, model: str):
        mlflow.set_tracking_uri(MLFLOW_URL)
        self.model = mlflow.pyfunc.load_model(model_uri=f"models:/{model}")

    async def __call__(self, inputs: ...) -> ...:
        return self.model.predict(inputs)

Topic		Replies	Views
Dynamically serve new model via Ray Serve Ray Serve	5	78	June 11, 2025
[Serve] New API not as good as old one for programmatic deployment	0	314	October 5, 2022
Ray serve on Kubernetes Ray Serve	14	936	March 27, 2024
Automating the serving of many different models Ray Serve	8	1685	May 3, 2023
Dynamic Deployment on Ray Serve Ray Serve	3	139	March 4, 2025

Ray serve with dynamic deployments

Related topics