Incoviniencies and minor issues or short report of one who didn't use Ray before

  • Low: It annoys or frustrates me for a moment.

Ray is cool thing and I hope this could help make it little bit better. I tried to figure out how to run web app on top of Ray. And here what I encountered.

  1. It turned out local python version matters when one use serve deploy. It’s necessary local machine and remote (rayproject/ray docker image) have same python version. It’s 3.7 for rayproject/ray:latest. I haven’t found any mention on this in docs.
  2. The rsync-up doesn’t work for me. Even though it reports no error or any issues. I see no the file in /home/ray inside a container (ray attach cluster.yaml) or in any other path.
ray rsync-up -v cluster.yaml src/multiple_deployment/greet.py /home/ray
2023-05-01 22:15:19,308 INFO util.py:376 -- setting max workers for head node type to 0
Loaded cached provider configuration from /tmp/ray-config-1f0d8960c2be5c525c5505ac1383bf757e6c84d2
If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
Creating AWS resource `ec2` in `us-west-2`
Creating AWS resource `ec2` in `us-west-2`
Fetched IP: 35.90.93.9
Running `mkdir -p /tmp/ray_tmp_mount/default/home && chown -R ubuntu /tmp/ray_tmp_mount/default/home`
Shared connection to 35.90.93.9 closed.
Running `rsync --rsh ssh -i /home/q/.ssh/ray-autoscaler_us-west-2.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_7694f4a663/c21f969b5f/%C -o ControlPersist=10s -o ConnectTimeout=120s -avz --exclude **/.git --exclude **/.git/** --filter dir-merge,- .gitignore greet.py ubuntu@35.90.93.9:/tmp/ray_tmp_mount/default/home/ray`
sending incremental file list

sent 54 bytes  received 12 bytes  44,00 bytes/sec
total size is 437  speedup is 6,62
Running `docker inspect -f '{{.State.Running}}' ray_container || true`
Shared connection to 35.90.93.9 closed.
Running `docker exec -it  ray_container /bin/bash -c 'mkdir -p /home'  && rsync -e 'docker exec -i' -avz /tmp/ray_tmp_mount/default/home/ray ray_container:/home/ray`
sending incremental file list

sent 58 bytes  received 12 bytes  140.00 bytes/sec
total size is 437  speedup is 6.24
Shared connection to 35.90.93.9 closed.
`rsync`ed greet.py (local) to /home/ray (remote)
  1. The cluster.yaml file located here Getting Started Guide — Ray 2.4.0 is not accepted by ray. It raises TyperError in VolumeSize: 140GB where the right one is VolumeSize: 140.

  2. The production deployment guide ([https]://docs.ray.io/en/latest/serve/production-guide/deploy-vm.html#) mentions some ports that are necessary, but it’s really messy and is not obvious that one can use these by e.g. ssh local port forwarding.

E.g. To run ray commands one only needs cluster.yaml.
To use serve deploy one needs DASHBOARD_AGENT_PORT that is 52365. It’s clear stated how to use it in CLI help message.

  -a, --address TEXT  Address to use to query the Ray dashboard agent
                      (defaults to http://localhost:52365). Can also be
                      specified using the RAY_AGENT_ADDRESS environment
                      variable.

To use serve run RAY_AGENT_ADDRESS is needed which has 10001 port. But actually the name of the variable is RAY_ADDRESS. One can use it like RAY_ADDRESS=ray://localhost:10001 serve run asdf_deployment:app. And it’s absolutely not obvious from CLI e.g.

  -a, --address TEXT           Address to use for ray.init(). Can also be
                               specified using the RAY_ADDRESS environment
                               variable.

So to deploy the payload to e.g. AWS based cluster one needs running these before.

ssh -L 52365:localhost:52365 -nNT -i /home/q/.ssh/ray-autoscaler_us-west-2.pem -v ubuntu@<HEAD-IP>
ssh -L 10001:localhost:10001 -nNT -i /home/q/.ssh/ray-autoscaler_us-west-2.pem -v ubuntu@<HEAD-IP>

The examples are available here

Thank you.

This is awesome feedback. Thank you so much for putting this together!

Do you want to make any PRs to help improve the documentation? Let me know - I’d be happy to help shepherd!

cc @Akshay_Malik for the serve issues

@architkulkarni cc on the rsync-up issue, cluster yaml

Thanks for reporting this!

  1. As for rsync-up, would you mind filing an issue on the Ray github? This sounds like a Ray bug. But it we back up to look at your use case, would using the working_dir field of a Runtime Environment work for you? Environment Dependencies — Ray 2.4.0

  2. The VolumeSize error was fixed recently on master and should be fixed in the next Ray release!

Hi @architkulkarni,

  1. Here it’s The ray rsync-up cli reports no issue, but actually file is absent on remote side (Ray AWS cluster) · Issue #35051 · ray-project/ray · GitHub.

Please look whether it fits as actually cli doesn’t support any runtime_env args as well as working_dir. And python code rsync() only accessible as implementation detail.

Hi @rliaw,

Basically I wouldn’t mind to make PR and fixes.
Sorry for the off topic. But I would like to ask. May be Anyscale can consider hiring me as soft. dev.? I have huge exp. in IT in different domains and technologies.

Thanks for the issue! Yeah, you wouldn’t be able to use the CLI with working_dir, but you’d specify it in your Ray Job when running your workload. (runtime_env is scoped to a Ray job, not a cluster. But it’s cached so you don’t have to reupload for each job.)