Workers Not Recognized on new Cluster

High

Hey Team,

So I’m trying to manually setup a ray cluster out on two ec2 instances (one head, one worker node), both with 16 cpus each, and for some reason every time I apply the ‘manual installation’ steps it appears that the workers are not added correctly. They actually appear to be added as ‘head’ nodes since they are not tucked under the ‘head node’ ip as they normally are when running locally.

On the head node I ran ray start --head --port=6379

On the worker node I ran ray start --address=HEAD_NODE_IP:6379

Open up the UI on the head node and sure enough, I see just two nodes with no workers.

I also tried setting the number of CPUs manually but without luck.

Two nodes is what we expect – a head node and a worker node.
Since there are no Ray workloads running yet, there are no worker processes.

The distinction between Ray worker nodes and Ray worker processes is confusing – I wish the terminology were better.

Let me know if I’ve understood your post correctly.

If I haven’t understood correctly, could you past a screenshot of what you’re seeing in the Ray Dashboard?

Hey Dmitri! - thx for the quick reply.

So below you will find what I expected (perhaps wrongly). As you can see, there are as many workers as I have CPUs under the HEAD local host node. The workers are also referenced in the ‘WORKER’ tab when I click into the Nodes detail.

After running the on-prem setup against our 2 EC2 instances, I don’t see any registered workers when you expand the Node and ‘WORKER’ tab shows 0 workers.

@aguo @sangcho might know more here

Hi,
Did you try adding the -node-ip-address=… argument when starting the head and the worker nodes? This should be the particular IP address of that machine in the network.