Ray serve with dynamic deployments

Hi,
We have a few questions related to the serve component.
We have a Ray cluster deployed in our K8s cluster using Helm. We use it to automatically deploy models created by our clients, and run predictions in real-time. The ray client (we call it the supervisor) is doing the following:

  • Deploy an api to add / remove models (which are then pulled from MLflow).
  • Consume data from a pipeline and call the models.

This is mainly working well, but we have some weird behaviors. Maybe we’re doing something wrong. Some may be related to the new update (2.0.0).

  • We have trouble understanding how/where the deployments are placed. Multiple versions of our supervisor can run simultaneously (locally, and in dev/staging/prod). The first supervisor instance works correctly and deploys the API deployment (code shown below). But the second instance will replace the API deployment with its own. The logs say:
    INFO 2022-09-22 01:24:36,316 controller 120 deployment_state.py:1232 - Adding 1 replicas to deployment 'dev_api'.
    INFO 2022-09-22 01:25:33,853 controller 120 deployment_state.py:1232 - Adding 1 replicas to deployment 'local_api'.
    INFO 2022-09-22 01:25:38,212 controller 120 deployment_state.py:1257 - Removing 1 replicas from deployment 'dev_api'.
    
    Why is that? We want to keep both deployments running at the same time; we gave them different names and prefixes. Same for the model deployments.
  • Maybe linked to the previous question. Sometimes, when a supervisor instance and its deployment is already running, the second cannot start because of an error in the API constructor (see below). The get_actor method does not find the actor created earlier, because it’s not in the same namespace (it’s clearer in the code). Is the deployment started in another namespace? Why?
    A (dirty?) fix is giving the root namespace to the constructor, but there’s certainly another way.

Other remarks:

  • We hope that you’ll still support the standard serve api (with .deploy() and serve.get_deployment(name).get_handle()) . The new one with composable and serve.run does not fit our workflow well.
  • It should be made more clear in the docs that serve must be launched with http_options={"host": "0.0.0.0"}. We lost quite some time understanding why the serve port was not reachable from another pod in the K8s cluster.
  • It looks like the autoscalerOptions is not configurable from the Helm values. Is it intended?

Thank you for your help and clarifications. I hope it’s not too much in the same post. I’ll be happy to add details if needed.


Config:
Kuberay 0.3.0 - deployed in a K8s cluster using Helm
Ray client 2.0.0 - deployed in K8s and locally
The supervisor starts with the following instructions:

ray.init(RAY_URL)
serve.start(detached=True, http_options={"host": "0.0.0.0"})

The API looks like this:

app = FastAPI()
@serve.deployment(name=APP_NAME__API_NAME, route_prefix=APP_NAME__API_PREFIX)
@serve.ingress(app)
class API:
    def __init__(self, root_namespace):
        print(ray.get_runtime_context().namespace, root_namespace)  # Those are different
        self.controller = ray.get_actor("controller", namespace=root_namespace)

    @app.post("/processors")
    def add_processor(self, processor: Processor):
        ...  # Deploy a processor which will create and apply a model

And the model deployments:

@serve.deployment(route_prefix=None)
class ModelDeployment:
    def __init__(self, model: str):
        mlflow.set_tracking_uri(MLFLOW_URL)
        self.model = mlflow.pyfunc.load_model(model_uri=f"models:/{model}")

    async def __call__(self, inputs: ...) -> ...:
        return self.model.predict(inputs)