Hi,
We have a few questions related to the serve component.
We have a Ray cluster deployed in our K8s cluster using Helm. We use it to automatically deploy models created by our clients, and run predictions in real-time. The ray client (we call it the supervisor) is doing the following:
- Deploy an api to add / remove models (which are then pulled from MLflow).
- Consume data from a pipeline and call the models.
This is mainly working well, but we have some weird behaviors. Maybe we’re doing something wrong. Some may be related to the new update (2.0.0).
- We have trouble understanding how/where the deployments are placed. Multiple versions of our supervisor can run simultaneously (locally, and in dev/staging/prod). The first supervisor instance works correctly and deploys the API deployment (code shown below). But the second instance will replace the API deployment with its own. The logs say:
Why is that? We want to keep both deployments running at the same time; we gave them different names and prefixes. Same for the model deployments.INFO 2022-09-22 01:24:36,316 controller 120 deployment_state.py:1232 - Adding 1 replicas to deployment 'dev_api'. INFO 2022-09-22 01:25:33,853 controller 120 deployment_state.py:1232 - Adding 1 replicas to deployment 'local_api'. INFO 2022-09-22 01:25:38,212 controller 120 deployment_state.py:1257 - Removing 1 replicas from deployment 'dev_api'.
- Maybe linked to the previous question. Sometimes, when a supervisor instance and its deployment is already running, the second cannot start because of an error in the API constructor (see below). The get_actor method does not find the actor created earlier, because it’s not in the same namespace (it’s clearer in the code). Is the deployment started in another namespace? Why?
A (dirty?) fix is giving the root namespace to the constructor, but there’s certainly another way.
Other remarks:
- We hope that you’ll still support the standard serve api (with
.deploy()
andserve.get_deployment(name).get_handle()
) . The new one with composable andserve.run
does not fit our workflow well. - It should be made more clear in the docs that serve must be launched with
http_options={"host": "0.0.0.0"}
. We lost quite some time understanding why the serve port was not reachable from another pod in the K8s cluster. - It looks like the
autoscalerOptions
is not configurable from the Helm values. Is it intended?
Thank you for your help and clarifications. I hope it’s not too much in the same post. I’ll be happy to add details if needed.
Config:
Kuberay 0.3.0 - deployed in a K8s cluster using Helm
Ray client 2.0.0 - deployed in K8s and locally
The supervisor starts with the following instructions:
ray.init(RAY_URL)
serve.start(detached=True, http_options={"host": "0.0.0.0"})
The API looks like this:
app = FastAPI()
@serve.deployment(name=APP_NAME__API_NAME, route_prefix=APP_NAME__API_PREFIX)
@serve.ingress(app)
class API:
def __init__(self, root_namespace):
print(ray.get_runtime_context().namespace, root_namespace) # Those are different
self.controller = ray.get_actor("controller", namespace=root_namespace)
@app.post("/processors")
def add_processor(self, processor: Processor):
... # Deploy a processor which will create and apply a model
And the model deployments:
@serve.deployment(route_prefix=None)
class ModelDeployment:
def __init__(self, model: str):
mlflow.set_tracking_uri(MLFLOW_URL)
self.model = mlflow.pyfunc.load_model(model_uri=f"models:/{model}")
async def __call__(self, inputs: ...) -> ...:
return self.model.predict(inputs)