I’m trying to cluster two local machines.
<Setup: Success>
head: 192.168.0.21
node: 192.168.0.22
On head machine (@Python)
ray.init(address=‘auto’)
On node machine(@ terminal)
ray start --address=‘192.168.0.21:6379’
Again, on head machine (@Python)
“ray.nodes()” prints
[
{‘NodeID’: ‘f22c89a7bfb82521a5cd58104da454063f9d731ec507f3af6f7410e4’, ‘Alive’: True, ‘NodeManagerAddress’: ‘192.168.0.22’, ‘NodeManagerHostname’: ‘cp22’, ‘NodeManagerPort’: 35225, ‘ObjectManagerPort’: 44411, ‘ObjectStoreSocketName’: ‘/tmp/ray/session_2022-05-28_00-29-38_080158_24534/sockets/plasma_store’, ‘RayletSocketName’: ‘/tmp/ray/session_2022-05-28_00-29-38_080158_24534/sockets/raylet’, ‘MetricsExportPort’: 61799, ‘alive’: True, ‘Resources’: {‘CPU’: 4.0, ‘object_store_memory’: 858294681.0, ‘node:192.168.0.22’: 1.0, ‘memory’: 2002687591.0}},
{‘NodeID’: ‘a3807ff7292d67422155743a6bc78046cc80b5fe0bc0999d370e7308’, ‘Alive’: True, ‘NodeManagerAddress’: ‘192.168.0.21’, ‘NodeManagerHostname’: ‘cp21’, ‘NodeManagerPort’: 43211, ‘ObjectManagerPort’: 39253, ‘ObjectStoreSocketName’: ‘/tmp/ray/session_2022-05-28_00-29-38_080158_24534/sockets/plasma_store’, ‘RayletSocketName’: ‘/tmp/ray/session_2022-05-28_00-29-38_080158_24534/sockets/raylet’, ‘MetricsExportPort’: 44215, ‘alive’: True, ‘Resources’: {‘object_store_memory’: 940081152.0, ‘node:192.168.0.21’: 1.0, ‘CPU’: 4.0, ‘memory’: 1880162304.0}}
]
I can check two nodes are available and also
“ray.cluster_resources()” prints
{‘object_store_memory’: 1798375833.0, ‘CPU’: 8.0, ‘memory’: 3882849895.0, ‘node:192.168.0.22’: 1.0, ‘node:192.168.0.21’: 1.0}
so It seems that setup is well done.
But, when I run the codes for checking cluster runs well which is on documentation of ray.
import time
@ray.remote
def f():
time.sleep(0.01)
return ray._private.services.get_node_ip_address()
set(ray.get([f.remote() for _ in range(1000)]))
prints “{‘192.168.0.21’}”
I got the results that only head(192.168.0.21) run, except node(192.168.0.22).
Is there something that I’m missing?