Add another end point to DAGDriver

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

We are using DAGDriver to deploy a workflow consisting of multiple models.
Now we want to add another end point for health checking, just simply return 200 if DAGDriver is running OK.

However I couldn’t find from docs how to do it. I can think of indirect way of serve run multi-app, with a separate deployment for health checking. It somehow can check runtime health but not DAGDriver itself.

Currently DAGDriver doesn’t provide a health check method out of the box. If you’re interested in this, could you submit a feature request on GitHub, so we can track it?

Assuming you have one DAGDriver replica, you could health check it by either (1) subclassing the DAGDriver with your own implementation that provides a health check endpoint and using that subclassed driver instead or (2) adding another deployment as you mentioned and health checking through that one.

Thank you @shrekris . Feature request was created at [Serve] Add a health check endpoint to DAGDriver · Issue #36152 · ray-project/ray · GitHub

In the mean time, we want to try subclassing DAGDriver as you commented. However, we’re new to Ray and after reading Ray internal source code, we are still not sure how to do it properly.

Could you kindly provide some pointer? Really appreciate your support.

Thanks for submitting the issue!

You could try subclassing the DAGDriver and overwriting its __init__ method. For instance:

from fastapi import Depends

class MyDriver(DAGDriver):

    async def __init__(self, dags, http_adapter):
        super().__init__(self, dags, http_adapter)

        @self.app.get("/driver-health-endpoint")
        async def handle_request(inp=Depends(http_adapter)):
            return "healthy"

Any request to the "/driver-health-endpoint" route should simply return the "healthy" string from the DAGDriver. You can use that route as a health check endpoint.

1 Like

Thank you. It looks good for our purpose, will test it out.

Thank you for the awesome discussion. I’m facing the same issue too!
I’ve tried to follow you subclass example, but no matter what I tried, just need to subclass DAGDriver leads to this error, can you guide me how to get pass through it? Thank you

    raise RuntimeError(
RuntimeError: The Deployment constructor should not be called directly. Use `@serve.deployment` instead.

Ah I see, that’s because the DAGDriver is decorated with @serve.deployment in the Serve codebase. When you import it, you’re not importing the DAGDriver; you’re importing a deployment that contains the DAGDriver. The simplest workaround is to copy and paste the DAGDriver code into your own codebase, remove the decorator, and subclass it there.

1 Like