I think it makes sense to run ray on ARM64 devices for two reasons:
AWS Graviton instances are extremely efficient for learning, but they are ARM64. I think as time goes on we will see more ARM64 (e.g. the new MacBooks)
rllib for robotics: Most SoCs/embedded devices are ARM64 rather than x86/64. This is a pretty big blocker in applying rllib to real-world problems.
I’m willing to help walk you through my build process, if this is something you are interested in supporting.
I tested this on Ubuntu 20.04 x86-64. From here, it’s mostly building python3 packages from source using pip bdist_wheel.
Our raspberry pis would OOM during compilation, which is why I set up the cross-compile framework.
If you have an ARM machine with enough memory, you should be able to pip install most ray deps (except py-spy which requires rust/cargo to install). I installed torch from source, but it seems that from torch1.8 they now support arm!
Just run on my Jetson from fresh OS install (Ubuntu20.04):
# required by pyarrow
wget https://dist.apache.org/repos/dist/dev/arrow/KEYS
apt-key add < KEYS
DISTRO=$(lsb_release --codename --short)
add-apt-repository "deb [arch=arm64] http://dl.bintray.com/apache/arrow/ubuntu $DISTRO main"
sudo apt install python3-pip build-essential curl unzip psmisc liblapack-dev libblas-dev llvm libarrow-dev libarrow-python-dev libhdf5-dev
pip3 install cython pytest torch torchvision
git clone https://github.com/ray-project/ray.git
# Install bazel, 4.0 has bug with protobuf so use 3.7
wget https://github.com/bazelbuild/bazel/releases/download/3.7.0/bazel-3.7.0-linux-arm64
chmod +x ./bazel-3.7.0-linux-arm64
# Make sure bazel works
./bazel-3.7.0-linux-arm64
# Move it as ray python build expects it here
mkdir -p ~/.bazel/bin
mv bazel-3.7.0-linux-arm64 ~/.bazel/bin/bazel
# dm-tree needs this in path
sudo ln -s ~/.bazel/bin/bazel /usr/local/bin/bazel
# ray ui
sudo apt install npm
pushd ray/dashboard/client
npm install
npm run build
popd
# rllib
cd ray/python
# Wheel builds successfully, can stop here if wheel is all you want
python3 setup.py bdist_wheel
# Now let's install the wheel (and deps) to our current machine
# Tensorflow and opencv will fail due to some dumb issues. No problem
# since torch works fine. Seems like building tf from source on arm64 is supported and no big deal
# https://collaborate.linaro.org/display/BDTS/Building+and+Installing+Tensorflow+on+AArch64
# We'll install a newer opencv version which works fine
pip3 install dist/ray-2.0.0.dev0-cp38-cp38-linux_aarch64.whl
cat python/requirements.txt python/requirements_rllib.txt | grep -v opencv | grep -v tensorflow | grep -v bazel | grep -v scikit-learn | grep -v reclaim pip3 install -r /dev/stdin
pip3 install opencv-python-headless scikit-learn lz4
Then test using
python3
>>> from ray.rllib.agents.ppo import PPOTrainer
>>> from ray import tune
>>> tune.run(PPOTrainer, config={"env": "CartPole-v0", "framework": "torch"})
There was already an aarch64 build available in https://github.com/ray-project/ray/issues/12128 as a Christmas gift. It seemed to work ok - but I’m a just a Ray beginner, so don’t take my word too seriously.
It would be really nice to also have regular aarch64 builds available that are compatible with current Python from miniforge platform Releases · conda-forge/miniforge · GitHub and probably also Ubuntu 20.04 LTS?
For playing with Ray and learning with a SBC cluster it’s fast enough…
@arayaday the above bash script should build it from source for Ubuntu20.04 aarch64. Run it in a conda env and it will build for your specific conda python. Still waiting for all the deps to install before I can run rllib --run DQN ... to verify.
I had 8GB of ram and was down to ~400MB during setup, so I wouldn’t try on anything with less than 8GB of memory. I think this is a good reason to setup CI, so this can run on some AWS graviton instance with 64GB of ram.