Can Ray support more than 1000 nodes?

Dear all,

I would like to have a 1000 nodes Ray cluster with 1CPU and 4GB memory per node.

I use this code to benchmark my cluster:

import time
import ray

# ray.init()
ray.init(address="head-service:6379")

print(len(ray.nodes()))
print('This cluster consists of {} CPU resources in total'.format(
    ray.cluster_resources()['CPU']))

@ray.remote
def real_empty():
    pass

@ray.remote
def empty():
    result_refs = []
    for i in range(1000):
        result_refs.append(real_empty.remote())
    ray.get(result_refs)

while True:
    result_refs = []
    for i in range(1000):
        result_refs.append(empty.remote())
    ray.get(result_refs)

But the performance is very low, “real_empty” can only be called 3-4 times per second.

Can ray support such a large number of Nodes?

Or where am I doing wrong?

Thx

If you have this many resources, you should use fewer nodes with more resources per node.
For example, each node has 50 cores, 200GB memory, so you only need 20 nodes.
Resources should be scheduled by ray, not container orchestration system like kubernetes.