How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
We are using DAGDriver to deploy a workflow consisting of multiple models.
Now we want to add another end point for health checking, just simply return 200 if DAGDriver is running OK.
However I couldn’t find from docs how to do it. I can think of indirect way of serve run multi-app, with a separate deployment for health checking. It somehow can check runtime health but not DAGDriver itself.
Currently DAGDriver
doesn’t provide a health check method out of the box. If you’re interested in this, could you submit a feature request on GitHub, so we can track it?
Assuming you have one DAGDriver
replica, you could health check it by either (1) subclassing the DAGDriver
with your own implementation that provides a health check endpoint and using that subclassed driver instead or (2) adding another deployment as you mentioned and health checking through that one.
Thank you @shrekris . Feature request was created at [Serve] Add a health check endpoint to DAGDriver · Issue #36152 · ray-project/ray · GitHub
In the mean time, we want to try subclassing DAGDriver as you commented. However, we’re new to Ray and after reading Ray internal source code, we are still not sure how to do it properly.
Could you kindly provide some pointer? Really appreciate your support.
Thanks for submitting the issue!
You could try subclassing the DAGDriver
and overwriting its __init__
method. For instance:
from fastapi import Depends
class MyDriver(DAGDriver):
async def __init__(self, dags, http_adapter):
super().__init__(self, dags, http_adapter)
@self.app.get("/driver-health-endpoint")
async def handle_request(inp=Depends(http_adapter)):
return "healthy"
Any request to the "/driver-health-endpoint"
route should simply return the "healthy"
string from the DAGDriver
. You can use that route as a health check endpoint.
1 Like
Thank you. It looks good for our purpose, will test it out.
Thank you for the awesome discussion. I’m facing the same issue too!
I’ve tried to follow you subclass example, but no matter what I tried, just need to subclass DAGDriver
leads to this error, can you guide me how to get pass through it? Thank you
raise RuntimeError(
RuntimeError: The Deployment constructor should not be called directly. Use `@serve.deployment` instead.
Ah I see, that’s because the DAGDriver
is decorated with @serve.deployment
in the Serve codebase. When you import it, you’re not importing the DAGDriver
; you’re importing a deployment that contains the DAGDriver
. The simplest workaround is to copy and paste the DAGDriver
code into your own codebase, remove the decorator, and subclass it there.
1 Like