Integrating GradioIngress and non-gradio endpoints

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I’m new to Ray, but so far so good! Using the Ray Server doc examples I’ve built a “Driver” ingress deployment that dispatches (by URL) between two other deployments, one for “/embeddings” and one for “/inference”. It works well!

Driver.__init__() captures the deployments, and Driver.__call__() accepts a Request and dispatches between them based on request.url.path, e.g.:

        if request.url.path.startswith('/embeddings'):
            ref = await self.embedding.classify.remote(req)
        ...
        return await ref  #produces a Response

I’d now like to serve a Gradio page under the “/chat” url. The standalone deployment works like a charm:

def gradio_builder():
    def respond(message, user_bot_history):
        return random.choice(["Yes", "No"])
    iface = gr.ChatInterface(respond)
    #without this there is a constant polling loop of some kind
    iface.config["dev_mode"] = False
    return iface

@serve.deployment(
    #route_prefix="/chat",
    ray_actor_options={"num_cpus": 0},
    autoscaling_config={"min_replicas": 1, "max_replicas": 1},
)
class MyGradioServer(GradioIngress):
    def __init__(self):
        super().__init__(gradio_builder)

deploy = MyGradioServer.bind()

Now, finally, my question; How do I combine this deployment with my Driver ingress deployment such that it supports “/embeddings”, “/inference”, and “/chat” from the same server?

@shrekris could you take a look at this?

I moved the questions to the serve channel.