I configured a Ray cluster follow the tutorial Using Ray cluster launcher , but encountered some problems.
My requirements are as follows:
There are currently three machines A, B, and C.
Among them, I executed following code on A, and I want A as my cluster node, but how?
ray up example-full.yaml
and example-full.yaml like:
docker:
image: "rayproject/ray-ml:latest-cpu"
container_name: "ray_container"
pull_before_run: True
run_options: # Extra options to pass into "docker run"
- --ulimit nofile=65536:65536
provider:
type: local
head_ip: <ipA>
worker_ips: [ipB, ipC]
I tried set head_ip with the A’s public ip(10.122.134.131) but failed. Ray will try to ssh to this ip, the error like this:
[1/7] Waiting for SSH to become available
Running `uptime` as a test.
Fetched IP: 10.122.134.131
ssh_exchange_identification: read: Connection reset by peer
SSH still not available (SSH command failed.), retrying in 5 seconds.
ssh_exchange_identification: read: Connection reset by peer
SSH still not available (SSH command failed.), retrying in 5 seconds.
My goal is simple: Create a cluster with A, B and C and need execute ray up command on one of those machines. Please help me, Thanks.