Semantics of `ray up` on existing cluster?

What exactly happens when I do a “ray up” on an existing cluster?
Does it re-launch the nodes? restarts the workers with a new configuration? what happens to the currently working jobs? etc.

I couldn’t find this stated clearly in the documentation.

Hey @Yoav,

ray up YAML on an existing cluster essentially does the following:

  • If the head node matches the cluster specification, the filemounts will be reapplied and the setup_commands and ray start commands will be run.
    • there may be some caching behavior here to skip setup/file mounts.
  • If the head node is out of date from the specified YAML (i.e., imagine if you changed the head node type on the YAML), then the out of date node will be terminated and a new node will be provisioned to replace it.
    • Setup/File mounts/ray start will be applied.
  • After the head node reaches a consistent state (after ray start commands are finished), the same above procedure will be applied to all worker nodes.

The ray start commands tend to run a ray stop + ray start, so this will kill currently working jobs.

You can do ray up --no-restart YAML to avoid running ray stop/ray start on the head node.
You can do ray up --restart-only YAML to skip the setup commands and only run ray stop/ray start on all nodes.

Hope that helps!