Understand the recommended ray cluster release workflow on GCP

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

just getting started on learning to use Ray, have few questions regarding the recommended approach on developement/ release workflow:

  1. right now we are following the this cluster approach: Launching Ray Clusters on GCP — Ray 2.31.0 (using GCP compute engine instances, not using GKE)
  2. for every release, we build our code into the docker image and point the image to this line: Cluster YAML Configuration Options — Ray 2.31.0
  3. turn down the existing ray cluster, then bring up a new cluster using ray up so that the new cluster would running on updated code base.

it is not convenient to do it this way because:

we have to run ray down first then ray up with the new docker image version, otherwise if directly running ray up, the new docker image would not get pulled, and therefore new code would not get used in the cluster.

Is there a recommended code release approach that would hopefully allow me to reuse an existing head node? thank you

What are you trying to do on top of Ray; are you trying to serve an ML model; run pre-processing, train something, offline inference etc?

we are trying to serve ML models using ray serve, therefore constantly adding new business logics

Ah got it - in the Ray Serve context check this out here

For an even better version see the Hosted Ray option on Anyscale here