Ray Cluster Launcher on GCE 'ray: command not found'

I am having issues setting up a cluster on GCE. Using
ray up cluster.yaml

I received the following error:

[7/7] Starting the Ray runtime
bash: ray: command not found

Here is cluster.yaml file:

cluster_name: ray-expr

max_workers: 0 
provider:
   type: gcp
   region: us-west1
   availability_zone: us-west1-b
   project_id: hs-deep-lab-donoho # Globally unique project id

auth:
   ssh_user: ubuntu
setup_commands:
  - pip3 install update pip
  - pip3 install ray[all]

@zhz @richardliaw, it would be great if you can take a look at this.

cc @Dmitri Can you also take a look at it?

Looks like the bin directory containing Ray didn’t wind up on the head node’s PATH variable.
You can take a look at the setup commands here
as a reference.
The example configs in that directory should be generally helpful.

Yeah, I suspect ray is installed under pip3 and not detected as part of the python path. @Dmitri 's suggestion sounds good - please let us know if you run into more problems!

Using example-minimal.yaml resulted in the following error:

[6/7] Running setup commands
    (0/3) wget https://repo.anaconda.com...
--2021-03-03 20:27:49--  https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.131.3, 104.16.130.3, 2606:4700::6810:8303, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.131.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 94235922 (90M) [application/x-sh]
Saving to: ‘/home/ubuntu/anaconda3.sh’

/home/ubuntu/anaconda3.sh 100%[===================================>]  89.87M   163MB/s    in 0.6s    

2021-03-03 20:27:50 (163 MB/s) - ‘/home/ubuntu/anaconda3.sh’ saved [94235922/94235922]

ERROR: File or directory already exists: '/home/ubuntu/anaconda3'
If you want to update an existing installation, use the -u option.
Shared connection to 35.230.97.102 closed.
    (1/3) pip install -U https://s3-us-w...
ERROR: ray-2.0.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl is not a supported wheel on this platform.
Shared connection to 35.230.97.102 closed.
2021-03-03 12:27:52,026	INFO node_provider.py:20 -- wait_for_compute_zone_operation: Waiting for operation operation-1614803271323-5bca7af5e9025-0d484c4e-85262e0b to finish...
2021-03-03 12:27:57,651	INFO node_provider.py:32 -- wait_for_compute_zone_operation: Operation operation-1614803271323-5bca7af5e9025-0d484c4e-85262e0b finished.
  New status: update-failed
  !!!
  SSH command failed.
  !!!
  
  Failed to setup head node.
zsh: exit 1     ray up example-minimal.yaml -y

(the second error is the critical one)

ray up example-full.yaml worked!

1 Like

The issue with example-minimal is resolved in master as of

Glad things worked with example-full!