In the documentation of Ray 1.13 there was a great example on integrating serve with model registries describing a very good use case where there are a set of pre-trained models on object storage, and we can programmatically deploy them with:
Without having to go through clumsy YAML files, CLI calls through subprocess, or listing the entire graph of potentially thousands of deployments.
In my use case, I have an admin endpoint to add a trained model to serve for inference. The endpoint simply receives the HDFS path of where the model artifact is, and uses the old 1.13 API to ModelDeployer.deploy(hdfs_path)
that model. It’s beautifully simple and all happens within python.
Now, that great example is gone from the docs (at least i cannot find it), and with the 2.0 API, this is my understanding:
-
Every time the admin endpoint is called, it would have to take the entire deployment graph, make the necessary changes, and “re-deploy it”. The graph cannot just be maintained as a variable in an actor as this deployment might have multiple replicas, so it would have to be taken from the serve agent every single time.
-
If using the
run
python API, every single deployment on the graph gets re-deployed, regardless of whether it has been changed or not (not sure if this is by choice or a bug) -
The only way to just update one deployment is through the CLI submitting a yaml file. This adds a lot of complexity as before i had just one Python function and now i got serialization through YAML, plus i need to start a new CLI subprocess every time the admin endpoint gets called - clumsy!
-
In the event two models finish training at the same time and the endpoint is called twice, the second graph might not include the deployment just added from the first call, as when it asked Ray serve to provide the DAG, that deployment did not exist yet! Bad experience! Now i have to somehow create a queue and ensure requests are not submitted too fast.
Overall, the previous API was a lot more flexible on this kind of programmatic use case. I appreciate the new one is strong on model composition, however it feels like it is assuming someone is preparing a deployment manually through version control (YAML file) every single time, but this is not the case in many use-cases.
Can we start a conversation around de-deprecating the 1.x deploy()
API please? it will be a great loss to see it gone.