Scale up from 0

Is it possible for Ray cluster to scale-up from 0 workers? If I start a Kubernetes cluster using minWorkers: 0 and I start a job no workers are launched. If I change that to minWorkers: 1 then everything scales up and works fine. The autoscaler doesn’t seem to recognize the demand when there are no workers.

======== Autoscaler status: 2021-07-13 05:56:40.044341 ========
Node status
 1 rayHeadType
 (no pending nodes)
Recent failures:
 (no failures)


 0.00/1.367 GiB memory
 0.00/0.571 GiB object_store_memory

 (no resource demands)
ray,ray:2021-07-13 05:56:40,046	DEBUG -- Cluster status: 0 nodes
 - MostDelayedHeartbeats: {'': 0.14696931838989258}
 - NodeIdleSeconds: Min=0 Mean=0 Max=0
 - ResourceUsage: 0.0 GiB/1.37 GiB memory, 0.0 GiB/0.57 GiB object_store_memory
 - TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0
Worker node types:
ray,ray:2021-07-13 05:56:45,170	DEBUG -- Cluster resources: [{'memory': 1468006400.0, 'object_store_memory': 612928463.0, 'node:': 1.0}]
ray,ray:2021-07-13 05:56:45,170	DEBUG -- Node counts: defaultdict(<class 'int'>, {'rayHeadType': 1})
ray,ray:2021-07-13 05:56:45,170	DEBUG -- Placement group demands: []
ray,ray:2021-07-13 05:56:45,171	DEBUG -- Resource demands: []
ray,ray:2021-07-13 05:56:45,171	DEBUG -- Unfulfilled demands: []
ray,ray:2021-07-13 05:56:45,190	DEBUG -- Node requests: {}
ray,ray:2021-07-13 05:56:45,218	INFO -- 

Got it. Can you provide the Python script you run after the cluster is launched?

This seems to be a recent regression – are you deploying with helm and the default images?

import time

import ray

LOCAL_PORT = 10001

def do_some_work(x):
    print('doing some work')
    time.sleep(1)  # Replace this is with work you need to do.
    return x*x

def main():
    start = time.time()
    results = [ray.get(do_some_work.remote(x)) for x in range(4)]
    print("duration =", time.time() - start)
    print("results = ", results)

if __name__ == '__main__':

I am deploying with helm and the head node is using the default image. The worker nodes use a custom image derived from rayproject/ray:latest-py37-cpu.

Got it.
Would you mind pasting the values.yaml to make sure we’re not missing anything?

    memory: 2000Mi
      role: head
      CPU: 0
    CPU: 62
    maxWorkers: 800
    memory: 244000Mi
    minWorkers: 1
      role: worker

Thanks! The 0 CPU annotation is an important detail.