How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
We have developed a gRPC service over ray serve. But are struggling to have the model distributed so that it is accessible when the init on the service is invoked. We don’t want the model to be in an external file storage (e.g. s3), but instead to be able to deploy it directly to the ray cluster. Ideally, Ray should be handling the distribution automatically.
We have been exploring Ray Objects, but this is not the right thing to do as it works at the Python objects level, occupies memory, and is slow.
An alternative is to manually copy the model to the nodes, but it has the problem of adding bureaucracy and won’t work well with autoscaling.
What’s the right way of doing it with Ray? Thank you very much.
Hi @ivan_prado, welcome to the forums!
Could you create a custom Docker image that contains the model (relevant docs)? That way, when a node starts running, the model is immediately accessible.
Thank you @shrekris for answering!
One problem I see with the docker approach, and maybe I’m not understanding it well, is that we would have to restart the whole cluster whenever we want to deploy a new model, and this would stop the service and other related services within the same cluster.
Or maybe I’m not understanding it well. How would you do deployments using the new docker images without interrupting the service?
You would have to start a new cluster. One way to do this without interrupting the service is to:
- Start the new cluster, and start the new Serve application on it.
- Shift traffic from your old cluster to your new cluster.
- Shut down the old cluster.
This is the pattern that KubeRay uses to perform zero-downtime updates.