Hi all,
I am using Ray Cluster to perform hyperparameter search using Ray Tune. My cluster is on GCP.
I noticed that after a while (an hour or so), the cluster is no longer autoscaling to request more worker nodes. On GCP, I am trying to get add an A100, but the availability is limited. I would like for the cluster to continue trying to autoscale.
Is it possible to re-initiate the adding of additional worker nodes without interrupting any running jobs? I saw there is the ray up .yaml --no-restart option, but wasn’t sure if that would re-initiate the autoscaler.
Thanks!