Workers Not Recognized on new Cluster

neoearth · November 8, 2022, 10:17pm

High

Hey Team,

So I’m trying to manually setup a ray cluster out on two ec2 instances (one head, one worker node), both with 16 cpus each, and for some reason every time I apply the ‘manual installation’ steps it appears that the workers are not added correctly. They actually appear to be added as ‘head’ nodes since they are not tucked under the ‘head node’ ip as they normally are when running locally.

On the head node I ran ray start --head --port=6379

On the worker node I ran ray start --address=HEAD_NODE_IP:6379

Open up the UI on the head node and sure enough, I see just two nodes with no workers.

I also tried setting the number of CPUs manually but without luck.

Dmitri · November 9, 2022, 5:36pm

Two nodes is what we expect – a head node and a worker node.
Since there are no Ray workloads running yet, there are no worker processes.

The distinction between Ray worker nodes and Ray worker processes is confusing – I wish the terminology were better.

Let me know if I’ve understood your post correctly.

Dmitri · November 9, 2022, 5:38pm

If I haven’t understood correctly, could you past a screenshot of what you’re seeing in the Ray Dashboard?

neoearth · November 10, 2022, 2:54pm

Hey Dmitri! - thx for the quick reply.

So below you will find what I expected (perhaps wrongly). As you can see, there are as many workers as I have CPUs under the HEAD local host node. The workers are also referenced in the ‘WORKER’ tab when I click into the Nodes detail.

After running the on-prem setup against our 2 EC2 instances, I don’t see any registered workers when you expand the Node and ‘WORKER’ tab shows 0 workers.

Dmitri · November 10, 2022, 3:11pm

@aguo @sangcho might know more here

bananajoe182 · March 3, 2023, 6:27pm

Hi,
Did you try adding the -node-ip-address=… argument when starting the head and the worker nodes? This should be the particular IP address of that machine in the network.

Topic		Replies	Views
Worker nodes not available with manual configuration Ray Core	5	462	May 5, 2021
Ray status does not see worker node Ray Clusters	6	1806	July 15, 2024
Ray cluster's worker node is pending Ray Clusters	2	1237	February 8, 2022
Only head node started, not worker nodes Ray Clusters	1	1511	January 19, 2022
Ray cluster number issue Ray Clusters	6	434	June 6, 2022

Workers Not Recognized on new Cluster

Related topics