Ray cluster one worker plasma is N/A

vkodakanchi · June 7, 2022, 4:54pm

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity
Low: It annoys or frustrates me for a moment.
Medium: It contributes to significant difficulty to complete my task, but I can work around it.
High: It blocks me to complete my task.

High

Hi,

I am spinning up clusters manually and mostly everything seems to work fine. But every single time there is one worker node that doesn’t execute the tasks (i’ve tried with 3 nodes to 20 nodes). Ive ssh’ed into the worker node to check ray status and everything seems OK. The only thing that I see different for this worker node vs the rest of the other worker nodes is the Plasma value in the dashboard. The plasma value is ‘N/A’. Other than that I do not see any other difference in the nodes. Any help to resolve this will be greatly appreciated. Thanks.

Alex · June 7, 2022, 6:45pm

Can you share more details about how you’re starting the node? If you’re using the autoscaler/cluster launcher can you share your config?

vkodakanchi · June 7, 2022, 8:43pm

Hi Alex,

I am starting the nodes manually and not using the autoscaler.

ray start --head --node-ip-address=“x.x.x.x” --port=6379 --dashboard-host=x.x.x.x --dashboard-port=443

and for worker nodes

ray start --address="$head_node_ip:6379" --node-ip-address=“y.y.y.y”

Thanks
Vishal.

Dmitri · June 8, 2022, 8:35am

Just curious – what happens if you don’t pass the node-ip flag? Why is it necessary in your case?

vkodakanchi · June 9, 2022, 5:35pm

Hi,

The issue still persists. We are using node-ip-address as we are spinning up the process within a container on each node.

Thanks
Vishal

Dmitri · June 9, 2022, 11:36pm

Are you running Ray start inside the worker node container?
I’d love to take a look into this – would you mind opening a bug report with reproduction details?

vkodakanchi · June 10, 2022, 2:55am

Sure will do.

Thanks
Vishal Kodakanchi

vkodakanchi · June 13, 2022, 12:36am

Hi Dmitri,

Here is the bug# https://github.com/ray-project/ray/issues/25711

Thanks
Vishal.

cade · June 14, 2022, 11:25pm

@Dmitri Can we resolve this issue and continue further discussion on the GitHub bug report?

Dmitri · June 14, 2022, 11:43pm

Carrying over to GitHub as recommended.

Topic		Replies	Views
Worker node workers/cores aren't not working	1	596	May 2, 2022
Failed to set up Ray cluster Ray Clusters	3	230	June 4, 2024
Ray cluster uses only Head node Ray Clusters	3	445	June 28, 2021
Unable to connect to head node Ray Clusters	4	781	July 12, 2022
Local cluster with multiple nodes in YAML config, while there's only head being started... Any hints? Ray Clusters	11	1639	June 17, 2022

Ray cluster one worker plasma is N/A

Related topics