What are the points of consideration when the Ray Team decided to turn off the autoscaling as default?
When I deploy a Ray cluster with the minimum worker nodes as 1 and the maximum worker nodes as 5. If I don’t set enableInTreeAutoscaling as true, is that mean the number of worker nodes would be fixed as 1 and it will never be changed?
What are the points of consideration when the Ray Team decided to turn off the autoscaling as default?
It simply started off as disabled by default because it’s a newer feature and there are no (known) large scale production users of autoscaling with KubeRay at the moment.
When I deploy a Ray cluster with the minimum worker nodes as 1 and the maximum worker nodes as 5. If I don’t set enableInTreeAutoscaling as true, is that mean the number of worker nodes would be fixed as 1 and it will never be changed?
it’s a newer feature and there are no (known) large scale production users of autoscaling with KubeRay at the moment.
It’s interesting, I really love the Autoscaling feature when I deploy on GCP or use legacy Ray operator on K8s. The Autoscaling can let me have a Ray cluster with 0 ~ 10 Ray worker nodes. It saves money. lol
What circumstances that users would like to have a fixed size Ray cluster on K8s?
Some users spin up short-lived ray clusters for their jobs. You can imagine for some batch data processing job (which practically runs on a fixed size cluster anyways), if you spin up the cluster, run the job, then tear down the cluster, autoscaling doesn’t provide as much benefit.
Others simply use some more domain-specific autoscaling logic. Note that with kuberay, you can update the RayCluster CR and the operator will add or remove pods to an existing cluster.
Some users spin up short-lived ray clusters for their jobs. You can imagine for some batch data processing job (which practically runs on a fixed size cluster anyways), if you spin up the cluster, run the job, then tear down the cluster, autoscaling doesn’t provide as much benefit.
Got it.
For the short-lived ETL job, I can imagine two workflows with KubeRay. Which one do you suggest?
Fixed size and short-lived cluster(as you described above):
helm install [MY-FIXED-SIZE-CLUSTER] --namespace [SPECIAL-NAMESPACE] .
# Run the Ray program with Ray submit or Ray Client
# The short-live ETL job finish
helm delete [MY-FIXED-SIZE-CLUSTER] --namespace [SPECIAL-NAMESPACE]
Some things I can imagine:
pros:
- There is not a dedicated Ray Head pod standby forever
- This fixed-size cluster only serves the short-lived ETL job
cons:
- Users must have permission to use kubectl and helm commands
A Ray cluster with Autoscaling:
Assume I set the Ray workers with 0 ~100 and enable the Autoscaling.
Some things I can imagine:
pros:
- Users can connect to Ray cluster using Ray Client without kubectl and helm commands
cons:
- There is a dedicated Ray Head pod
- The resources of the Ray Head pod are fixed. High-spec Ray Head would waste money. Low-spec Ray Head would cause the low performance
Others simply use some more domain-specific autoscaling logic. Note that with kuberay, you can update the RayCluster CR and the operator will add or remove pods to an existing cluster.
Sounds great. We can have a custom Autoscaling with KubeRay.
This Ray cluster start with 3 Ray workers but the Ray workers scale down to 1 after few seconds.
The number of workers will be changed from 1 ~ 5 depend on the demand tasks/actors.
The case 1 and 2 are reasonable. The Ray cluster would look at either replicas or miniReplicas because the enableInTreeAutoscaling is off.
The case 3 is also reasonable.
The case 4 was surprising me. Seems when the Ray cluster start up, it would use replicas. When the Ray cluster has already set up, the Autoscaler take over it.
Ultimately, you should pick between the options based on which pros/cons are most useful to you, but for most users with simple batch jobs, they tend to go with the cluster-per-job approach (which of course you can still use autoscaling with if it helps). Serving users tend to value the single large cluster approach more.