How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
We are using DAGDriver to deploy a workflow consisting of multiple models.
Now we want to add another end point for health checking, just simply return 200 if DAGDriver is running OK.
However I couldn’t find from docs how to do it. I can think of indirect way of serve run multi-app, with a separate deployment for health checking. It somehow can check runtime health but not DAGDriver itself.
DAGDriver doesn’t provide a health check method out of the box. If you’re interested in this, could you submit a feature request on GitHub, so we can track it?
Assuming you have one
DAGDriver replica, you could health check it by either (1) subclassing the
DAGDriver with your own implementation that provides a health check endpoint and using that subclassed driver instead or (2) adding another deployment as you mentioned and health checking through that one.
Thank you @shrekris . Feature request was created at [Serve] Add a health check endpoint to DAGDriver · Issue #36152 · ray-project/ray · GitHub
In the mean time, we want to try subclassing DAGDriver as you commented. However, we’re new to Ray and after reading Ray internal source code, we are still not sure how to do it properly.
Could you kindly provide some pointer? Really appreciate your support.
Thanks for submitting the issue!
You could try subclassing the
DAGDriver and overwriting its
__init__ method. For instance:
from fastapi import Depends
async def __init__(self, dags, http_adapter):
super().__init__(self, dags, http_adapter)
async def handle_request(inp=Depends(http_adapter)):
Any request to the
"/driver-health-endpoint" route should simply return the
"healthy" string from the
DAGDriver. You can use that route as a health check endpoint.
Thank you. It looks good for our purpose, will test it out.
Thank you for the awesome discussion. I’m facing the same issue too!
I’ve tried to follow you subclass example, but no matter what I tried, just need to subclass
DAGDriver leads to this error, can you guide me how to get pass through it? Thank you
RuntimeError: The Deployment constructor should not be called directly. Use `@serve.deployment` instead.
Ah I see, that’s because the
DAGDriver is decorated with
@serve.deployment in the Serve codebase. When you import it, you’re not importing the
DAGDriver; you’re importing a deployment that contains the
DAGDriver. The simplest workaround is to copy and paste the
DAGDriver code into your own codebase, remove the decorator, and subclass it there.