How to run multiple deployments in ray serve 2.0

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hello, how is it possible to deploy multiple models which are not in the same graph? i.e. I want to deploy A and B and query them with url/a, url/b.
I tried using multiple serve.run() or serve run file:deployment, but new runs always remove the previous deployment.
Also the RayService config has only one import_part, how can we deploy multiple services in one RayService?
Thanks

3 Likes

You can create independent deployments and bind them to a single DAGDriver. Then, you can serve.run()/serve run that single DAGDriver.

This section in the docs provides more details.

Yeah, although a single DAGDriver is a bit problematic if you have multiple endpoints (last example on the page linked above) and at least one of them has a fastapi ingress - the paths don’t work out as you’d expect.

1 Like

Incidentally: the dictionary form for constructing a DAGDriver isn’t mentioned in the general purpose docs for 2.0 (as far as I can see) only in the migration guide.

When I was converting from ray 1.11 to 2.0, I was still able to deploy multiple models like this:

Model1.options(init_args=model1_args, num_replicas=1, name="model1").deploy()

Model2.options( init_args=model2_args, num_replicas=num_reps, name="model2", ray_actor_options={"num_gpus": 1} ).deploy()

That being said, I was making direct calls on the model and not via the url served endpoints.

I did eventually convert over to use the DAG just to see how to works

@PaulRudin I’m not sure about the “the paths don’t work out as you’d expect”, do you have an example?

For me these two options work:

Option 1:

import requests
from fastapi import FastAPI
from ray import serve

app = FastAPI()


@serve.deployment()
@serve.ingress(app)
class Serve:
    @app.get("/ping1")
    def ping1(self):
        return "pong1"

    @app.get("/ping2")
    def ping2(self):
        return "pong2"


serve.run(Serve.bind())
resp = requests.get("http://localhost:8000/ping1")
assert resp.json() == "pong1"
resp = requests.get("http://localhost:8000/ping2")
assert resp.json() == "pong2"

Option 2:

import requests
from ray import serve
from ray.serve.drivers import DAGDriver


@serve.deployment()
class Ping1:
    def __call__(self):
        return "pong1"


@serve.deployment()
class Ping2:
    def __call__(self):
        return "pong2"


d = DAGDriver.bind({"/ping1": Ping1.bind(), "/ping2": Ping2.bind()})
handle = serve.run(d)


resp = requests.get("http://localhost:8000/ping1")
assert resp.json() == "pong1"
resp = requests.get("http://localhost:8000/ping2")
assert resp.json() == "pong2"

For reference, this doesn’t work:

import requests
from fastapi import FastAPI
from ray import serve


@serve.deployment(route_prefix="/ping")
class Ping1:
    def __call__(self):
        return "pong1"


@serve.deployment(route_prefix="/ping")
class Ping2:
    def __call__(self):
        return "pong2"


serve.run(Ping1.bind())
serve.run(Ping2.bind())

resp = requests.get("http://localhost:8000/ping1")
assert resp.json() == "pong1"
resp = requests.get("http://localhost:8000/ping2")
assert resp.json() == "pong2"

See here:

Hi Ray Serve team, I would like to check if ray serve currently supports to use case mentioned in the question? To deploy multiple models via multiple serve run on different URL_Path, like: serve run model1:deployment on /model1 and serve run model2:deployment on /model2. Thanks.

Hi @lizzzcai ,

As the discuss mentioned, right now you can only be able to deploy multiple models with single serve run in 2.0 (1.x to 2.x API Migration Guide — Ray 2.0.0).
The old api is still supported until we provide future api to use for your use case. (How to run multiple deployments in ray serve 2.0 - #5 by puntime_error)

Thanks @Sihan_Wang for your reply. Just want to know that is it something in the roadmap or it will need to open up an issue, thanks.