Hi all!
I am trying to set up a cluster in GCP via the Ray Cluster Launcher. I install ray from source in the cluster nodes.
Although I manage to set up the nodes, I cannot relaunch Ray in the cluster. Whenever I do a “ray up” , the head node is fine, but I get the following errors for the worker node: (from monitor.err)
rsync: failed to set times on “/home/ubuntu/ray/bazel-bin”: Operation not permitted (1)
rsync: failed to set times on “/home/ubuntu/ray/bazel-out”: Operation not permitted (1)
rsync: failed to set times on “/home/ubuntu/ray/bazel-ray”: Operation not permitted (1)
rsync: failed to set times on “/home/ubuntu/ray/bazel-testlogs”: Operation not permitted (1)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1196) [sender=3.1.2]
Thus, the worker node fails, and the Ray launcher deletes the node.
I would like to ask:
- Does anyone know why this is happening?
- Is there any way to not have the worker node deleted from GCP when Ray fails to restart? Because in that case, I would have to reinstall Ray from source, and this takes a lot of time.
Thank you!