Deploy, delete and use deployments in Ray Serve 2.0.0

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hello,

I have a service which uses RayServe 1.13 to serve Machine Learning models.
My service needs to dynamically deploy and delete the deployments.
My service calls the models using the RayServeHandle retrieved by using the function serve.get_deployment() given the deployment name.
Because RayServe 1.x does not restore the deployments after restart (at least not as a stable feature), we have a service to (programmatically) start and watch RayServe, and redeploy the necessary deployments.

I want to get the improvements in Ray 2.x, but many of RayServe Python APIs are now deprecated and I am concerned about how to proceed and what I should expect for the future Ray Serve API.

So, for Ray 2.0, some of my questions are:

  1. How can I delete a deployment using Python API (assuming that I only have the deployment name) ?
    In Ray 1.13, I could do something like: myapp = serve.get_deployment(name); myapp.delete().

  2. Is there a way to get the RayServeHandle by the deployment name ?
    In Ray 1.13, I could do: myhandle = serve.get_deployment(name).get_handle()

I would really appreciate any help.
Thanks!

=======

Hello,

I decided to write additional details about my scenario, after some time looking at Ray documentation and code.

As I said before, my service needs to dynamically deploy and undeploy models. Let’s assume there will be hundreds of different models and they are not related to each other, they are not part of a unique graph.

In Ray 1.13, I am able to manage those many models using something like “deployment.deploy()” and “deployment.delete()”.

In Ray 2.0.0, if I understood correctly, those APIs are now deprecated and I should use “serve.run(…)”. However, in order to use “server.run(…)”, I would have to build a big graph with all models, and all of them would be redeployed, even if only one was changed/included.
In addition, “serve.run(…)” seems to be considered dev workflow. So not proper to production?

So… I miss some official 2.0.0 API to deploy and delete a single deployment. It could be a single model or a graph. So I am not talking about creating a graph with individual deployments.

My service is not in production yet, but it was really easy to build the service using RayServe. However, I am really worried about the new version and whether I will be able to keep using it the way I am talking about here.

Thanks again!!

2 Likes

I’m in the same thought process. I’ve x number of unrelated models which doesn’t necessarily fit in a graph.
Building on what @luisp said,
I want to understand how I build this following server/workflow:
Assume you want to serve 5 models using ray serve and fastapi. I want to have separate file (1 file) which will do individual deployment of all this models on a cluster.
I’ll have a second fastapi server file which will get_handle, process request etc. From what I understand serve.run is used for a single deployment.

In 1.13, I used to use .deploy for all my ClassNodes in server.py and then used to get_handle, process etc in the particular endpoint.

Thanks

Hi @luisp! Thank you for asking the question! have you checkout the 1.x to 2.x API Migration Guide — Ray 2.0.0?

In all, you can deploy the whole deployment with serve.run(), serve will keep the old running deployment and remove unnecessary ones. no redeploy happens.

cc: @plum9 lmk if that answers your question too.

Hi @Sihan_Wang !

Thanks for your response! Yes, I have read the Migration Guide.

As far as I could see in the code, “server.run(…)” will redeploy all deployments, even if they don’t change. This seems to be confirmed by @architkulkarni in this post. However, even if “server.run(…)” checks the changes, every time I want to update a single model, among hundreds of independent deployments, I would have to build and submit a big graph with hundreds of deployments. Suppose I have two concurrent requests to update two distinct models, I will probably have trouble with that.

If Ray Serve is used to serve a set of related deployments that change together, deploying with this “big graph” makes sense to me. But if each model may change independently and even more frequently, I would prefer an API to create, update and delete individual deployments (as in Ray 1.13).

Hi @luisp , That is correct for dev workflow, and serve will redeploy all when using serve.run().
In serve side, we want to have a single snapshot (source of truth) to control all the deployments in production, you can have a single yaml file to manage your deployments, and add/remove the deployments from there.
In production workflow, you can just deploy the yaml file to serve (the file can be generated easily after you finishing your dev testing by ‘serve build’ cli), it will be easier to handle the version control and manage the whole system, serve will handle the new model deploy and keep the old model running without redeploying.

  1. Is there a way to get the RayServeHandle by the deployment name ?

I’ve been able to do this:

from ray.serve.context import get_global_client
my_handle = get_global_client().get_handle(model_name, missing_ok=True, sync=True)

1 Like

But what if i want to add/remove deployments programmatically, not submit a yaml file. Then, the new API seems to be a step back from the previous one, right? There are use cases where you want to maintain a single Ray Cluster and “Many Graphs” which are unrelated to each other. Other use cases where you have models saved in object storage and want to send a command to expose them or re-train existing models. The old API felt a lot more expressive

2 Likes

Imagine this use case as well: I am the data science platform owner, and have two groups of users. I want to make a single shared Ray cluster and Ray serve instance available to these two groups so I can manage a single cluster with a single set of metrics, one single cluster to auto-scale, sharing of gpu across the two groups, etc.

The two groups of users do not know/care about the details of the deployments of the other group. They each get a /prefix endpoint and they can use the cluster independently.

The 1.x API makes all the above trivial, API v2 makes it very difficult. I honestly liked API v1 a lot more as API v2 is only good for a single deployment with dependencies, deployed by a single user group on a dedicated cluster.

1 Like