Question about ray/python/ray/autoscaler/local/example-full.yaml

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

I try to run ray up ray/python/ray/autoscaler/local/example-full.yaml on the head machine. it is woring. but we I try to use the work to connect to the head, I find the python version is different from the head. and get the following error.

RuntimeError: Version mismatch: The cluster was started with:
    Ray: 1.13.0
    Python: 3.7.7
This process on node work machine was started with:
    Ray: 1.13.0
    Python: 3.10.4

I guess the reason for this is head launch use 3.7.7 but which I use ray start --address=‘XXXX:6388’ to connect the work machine is 3.10.4. I cannot find where is the docker file to config the python version.

Hi @mabodx
Thanks for posting this question.
The python version is determined by the docker image you use. For example, “ray/ray-project:latest-gpu” is using python 3.7.7.

Is there any reason you want to run “ray start” on your work machine to manually join the cluster? Can it be part of the cluster with “ray up”?

cc @rickyyx

1 Like

because, I find out the work machine does not include in the cloud in the after launch the head.
after I launch the head, it says

Next steps
  To connect to this Ray runtime from another node, run
    ray start --address='head_machine_ip:6382'

I check the work status, it is says

Healthy:
 1 local.cluster.node
Pending:
 worker_ip: local.cluster.node, waiting-for-ssh
Recent failures:
 (no failures)

Are you saying when you run ray up cluster.yaml, the worker node is pending forever so you have to manually run ray start to start one?

What’s your cluster.yaml setup: how many worker nodes are there?

yes, there are only 1 worker in there.