So I’m trying to manually setup a ray cluster out on two ec2 instances (one head, one worker node), both with 16 cpus each, and for some reason every time I apply the ‘manual installation’ steps it appears that the workers are not added correctly. They actually appear to be added as ‘head’ nodes since they are not tucked under the ‘head node’ ip as they normally are when running locally.
On the head node I ran ray start --head --port=6379
On the worker node I ran ray start --address=HEAD_NODE_IP:6379
Open up the UI on the head node and sure enough, I see just two nodes with no workers.
I also tried setting the number of CPUs manually but without luck.
So below you will find what I expected (perhaps wrongly). As you can see, there are as many workers as I have CPUs under the HEAD local host node. The workers are also referenced in the ‘WORKER’ tab when I click into the Nodes detail.
After running the on-prem setup against our 2 EC2 instances, I don’t see any registered workers when you expand the Node and ‘WORKER’ tab shows 0 workers.
Hi,
Did you try adding the -node-ip-address=… argument when starting the head and the worker nodes? This should be the particular IP address of that machine in the network.