Why the KubeRay disable the autoscaling in default?

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

In the KubeRay Doc: Autoscaling, the autoscaling is an option when we deploy a Ray cluster.

What are the points of consideration when the Ray Team decided to turn off the autoscaling as default?

When I deploy a Ray cluster with the minimum worker nodes as 1 and the maximum worker nodes as 5. If I don’t set enableInTreeAutoscaling as true, is that mean the number of worker nodes would be fixed as 1 and it will never be changed?

What are the points of consideration when the Ray Team decided to turn off the autoscaling as default?

It simply started off as disabled by default because it’s a newer feature and there are no (known) large scale production users of autoscaling with KubeRay at the moment.

When I deploy a Ray cluster with the minimum worker nodes as 1 and the maximum worker nodes as 5. If I don’t set enableInTreeAutoscaling as true, is that mean the number of worker nodes would be fixed as 1 and it will never be changed?

Yep

Hi Alex, thanks for your kindly reply.

it’s a newer feature and there are no (known) large scale production users of autoscaling with KubeRay at the moment.

It’s interesting, I really love the Autoscaling feature when I deploy on GCP or use legacy Ray operator on K8s. The Autoscaling can let me have a Ray cluster with 0 ~ 10 Ray worker nodes. It saves money. lol

What circumstances that users would like to have a fixed size Ray cluster on K8s?

Some users spin up short-lived ray clusters for their jobs. You can imagine for some batch data processing job (which practically runs on a fixed size cluster anyways), if you spin up the cluster, run the job, then tear down the cluster, autoscaling doesn’t provide as much benefit.

Others simply use some more domain-specific autoscaling logic. Note that with kuberay, you can update the RayCluster CR and the operator will add or remove pods to an existing cluster.

Some users spin up short-lived ray clusters for their jobs. You can imagine for some batch data processing job (which practically runs on a fixed size cluster anyways), if you spin up the cluster, run the job, then tear down the cluster, autoscaling doesn’t provide as much benefit.

Got it.
For the short-lived ETL job, I can imagine two workflows with KubeRay. Which one do you suggest?

  1. Fixed size and short-lived cluster(as you described above):

    helm install [MY-FIXED-SIZE-CLUSTER] --namespace [SPECIAL-NAMESPACE] .
    # Run the Ray program with Ray submit or Ray Client
    # The short-live ETL job finish
    helm delete [MY-FIXED-SIZE-CLUSTER] --namespace [SPECIAL-NAMESPACE]
    

    Some things I can imagine:
    pros:
    - There is not a dedicated Ray Head pod standby forever
    - This fixed-size cluster only serves the short-lived ETL job
    cons:
    - Users must have permission to use kubectl and helm commands

  2. A Ray cluster with Autoscaling:
    Assume I set the Ray workers with 0 ~100 and enable the Autoscaling.
    Some things I can imagine:
    pros:
    - Users can connect to Ray cluster using Ray Client without kubectl and helm commands
    cons:
    - There is a dedicated Ray Head pod
    - The resources of the Ray Head pod are fixed. High-spec Ray Head would waste money. Low-spec Ray Head would cause the low performance

Others simply use some more domain-specific autoscaling logic. Note that with kuberay, you can update the RayCluster CR and the operator will add or remove pods to an existing cluster.

Sounds great. We can have a custom Autoscaling with KubeRay.

Some configuration related to this question. I use Helm Chart to deploy the Ray cluster.

Since the Autoscaling become an optional sidecar, the replicas, miniReplicas and maxiReplicas would be a little be confused.


Case 1:

head:
  enableInTreeAutoscaling: false
worker:
  replicas: 3
  miniReplicas: 1
  maxiReplicas: 5

This Ray cluster start with 3 Ray workers and the number of workers don’t change.


Case 2:

head:
  enableInTreeAutoscaling: false
worker:
  # replicas: 3
  miniReplicas: 1
  maxiReplicas: 5

This Ray cluster start with 1 Ray workers and the number of workers don’t change.


Case 3:

head:
  enableInTreeAutoscaling: true
worker:
  # replicas: 3
  miniReplicas: 1
  maxiReplicas: 5

This Ray cluster start with 1 Ray workers and the number of workers will be changed from 1 ~ 5 depend on the demand tasks/actors.


Case 4:

head:
  enableInTreeAutoscaling: true
worker:
  replicas: 3
  miniReplicas: 1
  maxiReplicas: 5

This Ray cluster start with 3 Ray workers but the Ray workers scale down to 1 after few seconds.
The number of workers will be changed from 1 ~ 5 depend on the demand tasks/actors.


The case 1 and 2 are reasonable. The Ray cluster would look at either replicas or miniReplicas because the enableInTreeAutoscaling is off.

The case 3 is also reasonable.

The case 4 was surprising me. Seems when the Ray cluster start up, it would use replicas. When the Ray cluster has already set up, the Autoscaler take over it.

Fixed size and short-lived cluster(as you described above):

Pro here is that it’s a lot easier to specify different docker images for different jobs.

A Ray cluster with Autoscaling:

Another advantage here is that job submission may be faster since there may already be hot nodes.

A con here is that dependency management for things runtime envs can’t handle can be quite a bit more annoying here.

1 Like

Ultimately, you should pick between the options based on which pros/cons are most useful to you, but for most users with simple batch jobs, they tend to go with the cluster-per-job approach (which of course you can still use autoscaling with if it helps). Serving users tend to value the single large cluster approach more.

1 Like