Can Ray support more than 1000 nodes?

yrjyrj123 · January 31, 2022, 2:53pm

Dear all,

I would like to have a 1000 nodes Ray cluster with 1CPU and 4GB memory per node.

I use this code to benchmark my cluster:

import time
import ray

# ray.init()
ray.init(address="head-service:6379")

print(len(ray.nodes()))
print('This cluster consists of {} CPU resources in total'.format(
    ray.cluster_resources()['CPU']))

@ray.remote
def real_empty():
    pass

@ray.remote
def empty():
    result_refs = []
    for i in range(1000):
        result_refs.append(real_empty.remote())
    ray.get(result_refs)

while True:
    result_refs = []
    for i in range(1000):
        result_refs.append(empty.remote())
    ray.get(result_refs)

But the performance is very low, “real_empty” can only be called 3-4 times per second.

Can ray support such a large number of Nodes?

Or where am I doing wrong?

Thx

jerky · February 2, 2022, 11:14am

If you have this many resources, you should use fewer nodes with more resources per node.
For example, each node has 50 cores, 200GB memory, so you only need 20 nodes.
Resources should be scheduled by ray, not container orchestration system like kubernetes.

Topic		Replies	Views
Ray doesn't use all CPUs Ray Tune	0	289	March 10, 2024
Ray distributed memory parallelism Ray Core	3	456	October 20, 2023
Ray.nodes() is not showing me all available resources Ray Clusters	3	376	January 6, 2023
Too many pyhton processes on Node Ray Clusters	2	327	January 18, 2023
Some questions about Ray on Kubernetes Ray Clusters	3	775	December 3, 2021

Can Ray support more than 1000 nodes?

Related topics