How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
Hi,
If I deploy a named-actor(e.g. name=‘title-predictor’) via a ray job submission, I currently kill the existing named actor and then deploy the latest code with the same actor name.
The way I do it currently though, is that each named actor has it’s own runtime environment dependencies(it’s own pip requirements file). So we can have two named-actors with different conflicting dependencies on the same cluster. I understand that it’s recommended to have a single container per cluster in production to capture the environment but having a separate runtime env per actor is such a wonderful idea that we are trying to see how far we can push it in production.
However, if I update the code for that actor, re-deploying each named actor may take a few minutes while the requirements are installed for it in the cluster. So if I kill the existing actor with the name title-predictor
and then redeploy via the jobs api another actor with the same name and if that redeployment takes a few minutes, then there won’t be any named actor called title-predictor
for that duration? So if some client has called title_actor = ray.get_actor('title_detector')
and makes a call on title_actor
, it’ll fail if it is made within those few minutes between the kill of the old actor and the availability of the new actor?
So, is there a way to do a rolling upgrade of named actors? Is this a bad design smell? The way I was thinking of doing this is, the code for each actor is in a different git repo. When it’s CI/CD gets activated, it’ll redeploy the named actor in an existing ray cluster. But my challenge is that it’ll be unavailable for those few minutes as mentioned above. If there are other ways of deploying named actors, let me know.
I understand that ray serve has rolling upgrades(think I read it on the Ray Blog). That’s great but inside the same ray cluster I like the Ray Actor interface and I find it very intuitive for data scientists(and engineers) to use named actors. A data scientist does not have to worry about fastapi/serve deployments/serve run/bind for e.g.