- Low: It annoys or frustrates me for a moment.
Ray is cool thing and I hope this could help make it little bit better. I tried to figure out how to run web app on top of Ray. And here what I encountered.
- It turned out local python version matters when one use
serve deploy
. It’s necessary local machine and remote (rayproject/ray docker image) have same python version. It’s 3.7 for rayproject/ray:latest. I haven’t found any mention on this in docs. - The
rsync-up
doesn’t work for me. Even though it reports no error or any issues. I see no the file in/home/ray
inside a container (ray attach cluster.yaml
) or in any other path.
ray rsync-up -v cluster.yaml src/multiple_deployment/greet.py /home/ray
2023-05-01 22:15:19,308 INFO util.py:376 -- setting max workers for head node type to 0
Loaded cached provider configuration from /tmp/ray-config-1f0d8960c2be5c525c5505ac1383bf757e6c84d2
If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
Creating AWS resource `ec2` in `us-west-2`
Creating AWS resource `ec2` in `us-west-2`
Fetched IP: 35.90.93.9
Running `mkdir -p /tmp/ray_tmp_mount/default/home && chown -R ubuntu /tmp/ray_tmp_mount/default/home`
Shared connection to 35.90.93.9 closed.
Running `rsync --rsh ssh -i /home/q/.ssh/ray-autoscaler_us-west-2.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_7694f4a663/c21f969b5f/%C -o ControlPersist=10s -o ConnectTimeout=120s -avz --exclude **/.git --exclude **/.git/** --filter dir-merge,- .gitignore greet.py ubuntu@35.90.93.9:/tmp/ray_tmp_mount/default/home/ray`
sending incremental file list
sent 54 bytes received 12 bytes 44,00 bytes/sec
total size is 437 speedup is 6,62
Running `docker inspect -f '{{.State.Running}}' ray_container || true`
Shared connection to 35.90.93.9 closed.
Running `docker exec -it ray_container /bin/bash -c 'mkdir -p /home' && rsync -e 'docker exec -i' -avz /tmp/ray_tmp_mount/default/home/ray ray_container:/home/ray`
sending incremental file list
sent 58 bytes received 12 bytes 140.00 bytes/sec
total size is 437 speedup is 6.24
Shared connection to 35.90.93.9 closed.
`rsync`ed greet.py (local) to /home/ray (remote)
-
The
cluster.yaml
file located here Getting Started Guide — Ray 2.4.0 is not accepted by ray. It raises TyperError inVolumeSize: 140GB
where the right one isVolumeSize: 140
. -
The production deployment guide ([https]://docs.ray.io/en/latest/serve/production-guide/deploy-vm.html#) mentions some ports that are necessary, but it’s really messy and is not obvious that one can use these by e.g. ssh local port forwarding.
E.g. To run ray
commands one only needs cluster.yaml
.
To use serve deploy
one needs DASHBOARD_AGENT_PORT
that is 52365
. It’s clear stated how to use it in CLI help message.
-a, --address TEXT Address to use to query the Ray dashboard agent
(defaults to http://localhost:52365). Can also be
specified using the RAY_AGENT_ADDRESS environment
variable.
To use serve run
RAY_AGENT_ADDRESS
is needed which has 10001
port. But actually the name of the variable is RAY_ADDRESS
. One can use it like RAY_ADDRESS=ray://localhost:10001 serve run asdf_deployment:app
. And it’s absolutely not obvious from CLI e.g.
-a, --address TEXT Address to use for ray.init(). Can also be
specified using the RAY_ADDRESS environment
variable.
So to deploy the payload to e.g. AWS based cluster one needs running these before.
ssh -L 52365:localhost:52365 -nNT -i /home/q/.ssh/ray-autoscaler_us-west-2.pem -v ubuntu@<HEAD-IP>
ssh -L 10001:localhost:10001 -nNT -i /home/q/.ssh/ray-autoscaler_us-west-2.pem -v ubuntu@<HEAD-IP>
The examples are available here
Thank you.