Ray-1.2.0 for armv7 - no errors but process is aborted

I have build wheel for ray ray-1.2.0-cp38-cp38-linux_armv7l.whl but I have some problem.

Command :

pip install ‘ray[rllib]’

works fine but after:

rllib train --run=PPO --env=CartPole-v0 -vv

computation is aborted and there is no information about errors, dashboard doesn’t work. I have no idea how to debug it. I will be grateful for any suggestions.

What’s the OS and the architecture? Are you using ARN?

Seems like this is on an ARM wheel. Could you post the ray logs (/tmp/ray/session_latest/logs)?

@sangcho @rliaw OS is raspbian (Raspberry PI 4), architecture armv7, python 3.8.
Thank You for info about logs I will check it and paste the result.

@rliaw error logs are here:
https://github.com/PeterPirog/Raspberry_armv7_builds/tree/main/_error_logs

Hmm I’ve never seen this error before…


[2021-02-27 10:23:38,372 C 8647 8657] dlmalloc.cc:103: failed to ftruncate file /dev/shm/plasmaTO48oB
[2021-02-27 10:23:38,372 E 8647 8657] logging.cc:415: *** Aborted at 1614417818 (unix time) try "date -d @1614417818" if you are using GNU date ***
[2021-02-27 10:23:38,422 E 8647 8657] logging.cc:415: PC: @        0x0 (unknown)
[2021-02-27 10:23:38,422 E 8647 8657] logging.cc:415: *** SIGABRT (@0x21c7) received by PID 8647 (TID 0xb23fb3f0) from PID 8647; stack trace: ***
[2021-02-27 10:23:38,423 E 8647 8657] logging.cc:415:     @ 0xb6b9a130 (unknown)

How did you build the arm wheel?

@sangcho I have build it for raspbian raspberry pi, here is the method of building:
https://github.com/PeterPirog/Raspberry_armv7_builds/blob/main/ray/How_to_build.txt

Now I try to build ray-1.2.0 again for raspberry ubuntu, it changes architecture from armv7 to aarch64.
For raspbian system architecture is armv7, for ubuntu 64-bit architecture is aarch64

Hmm I am not really familiar with how this wheel works, so it could be hard for me to help. But according to this error; failed to ftruncate file /dev/shm/plasmaTO48oB, I think the issue is ftruncate failed on your arch & wheel. ftruncate(2): truncate file to specified length - Linux man page (Look at the errors section).

Do you think it could be helpful if I make a PR to print the error code here?

@sangcho , I’m not sure what abbreviation is PR but if you have some idea to solve the problem I will be gratefull. In the same time I try to build ray-1.2.0 for aarch64 Ubuntu but the first step is to build pytorch and tensorflow for python 3.8 aarch64.

Ah, PR mean the pull request (so I will print out the error code of ftruncate so that we can figure out why it doesn’t work on your arch)

@sangcho Thank You for explanation (I’m new in python programming and building packages, machine learning is my new hobby when covid has started, I’m metrology enginner not programmer). Now, I continue my work with buildling tensorflow and pytorch wheels for aarch64. I hope ubuntu is better choice for raspberry pi than raspbian if I want use ray.

Yep. Lmk how this works. Meanwhile, I can push the PR to improve the error msg.

@sangcho @rliaw After many hours of strugling with raspberry pi 4 and ray-1.2.0 there is success :smiley:
Now, I try to make script to download my build and install it on raspberry device automatically.
The crucial is:

  1. Using as OS Ubuntu 20.10 with python 3.8 not Raspbian
  2. Installing proper libraries by apt-get
  3. Using proper versions of packages to work with tensorflow2, pytorch and ray. Some packages I have built myself because there were no correct versions available. Tensorflow has over 100 MB so I can’t pull it to github.
absl-py==0.11.0
aiohttp==3.7.4
aiohttp-cors==0.7.0
aioredis==1.3.1
astunparse==1.6.3
async-timeout==3.0.1
atari-py==0.2.6
attrs==20.3.0
blessings==1.7
cachetools==4.2.1
certifi==2020.12.5
chardet==3.0.4
click==7.1.2
cloudpickle==1.6.0
colorama==0.4.4
colorful==0.5.4
dm-tree==0.1.5
filelock==3.0.12
flatbuffers==1.12
future==0.18.2
gast==0.3.3
google-api-core==1.26.0
google-auth==1.27.0
google-auth-oauthlib==0.4.2
google-pasta==0.2.0
googleapis-common-protos==1.53.0
gpustat==0.6.0
grpcio==1.36.1
gym @ file:///home/pi/RayProject/whls/gym-0.18.0-py3-none-any.whl
h5py==2.10.0
hiredis==1.1.0
idna==2.10
jsonschema==3.2.0
Keras-Preprocessing==1.1.2
lz4==3.1.3
Markdown==3.3.4
msgpack==1.0.2
multidict==5.1.0
numpy==1.20.1
nvidia-ml-py3==7.352.0
oauthlib==3.1.0
opencensus==0.7.12
opencensus-context==0.1.2
opencv-python==4.5.1.48
opencv-python-headless==3.4.13.47
opt-einsum==3.3.0
packaging==20.9
pandas==1.2.3
Pillow==7.2.0
prometheus-client==0.9.0
protobuf==3.15.4
psutil==5.8.0
py-spy==0.3.4
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyglet==1.5.0
pyparsing==2.4.7
pyrsistent==0.17.3
python-dateutil==2.8.1
pytz==2021.1
PyYAML==5.4.1
ray @ file:///home/pi/RayProject/whls/ray-1.2.0-cp38-cp38-linux_aarch64.whl
redis==3.5.3
requests==2.25.1
requests-oauthlib==1.3.0
rsa==4.7.2
scipy==1.6.1
six==1.15.0
tabulate==0.8.9
tensorboard==2.4.1
tensorboard-plugin-wit==1.8.0
tensorboardX==2.1
tensorflow @ file:///home/pi/RayProject/whls/tensorflow-2.4.1-cp38-none-linux_aarch64.whl
tensorflow-estimator==2.4.0
termcolor==1.1.0
torch @ file:///home/pi/RayProject/whls/torch-1.7.0a0-cp38-cp38-linux_aarch64.whl
typing-extensions==3.7.4.3
urllib3==1.26.3
Werkzeug==1.0.1
wrapt==1.12.1
yarl==1.6.3

Result:

sudo apt-get update && sudo apt-get upgrade
sudo apt-get install python3-dev python3-pip python3-venv python3-wheel -y
sudo apt-get install build-essential cmake unzip pkg-config

sudo apt-get -y install libjpeg-dev libpng-dev libtiff-dev
sudo apt-get -y install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
sudo apt-get -y install libxvidcore-dev libx264-dev

sudo apt-get -y install libgtk-3-dev npm

sudo apt-get -y install libcanberra-gtk*

sudo apt-get -y install libatlas-base-dev gfortran

sudo apt-get -y install python3-dev

sudo apt-get -y install libaom0 libatlas3-base libavcodec58 libavformat58 libavutil56 libbluray2 libcairo2 libchromaprint1 libcodec2-0.8.1 libcroco3 libdatrie1 libdrm2 libfontconfig1 libgdk-pixbuf2.0-0 libgfortran5 libgme0 libgraphite2-3 libgsm1 libharfbuzz0b libilmbase23 libjbig0 libmp3lame0 libmpg123-0 libogg0 libopenexr23 libopenjp2-7 libopenmpt0 libopus0 libpango-1.0-0 libpangocairo-1.0-0 libpangoft2-1.0-0 libpixman-1-0 librsvg2-2 libshine3 libsnappy1v5 libsoxr0 libspeex1 libssh-gcrypt-4 libswresample3 libswscale5 libthai0 libtheora0 libtiff5 libtwolame0 libva-drm2 libva-x11-2 libva2 libvdpau1 libvorbis0a libvorbisenc2 libvorbisfile3 libvpx5 libwavpack1 libwebp6 libwebpmux3 libx264-155 libx265-165 libxcb-render0 libxcb-shm0 libxfixes3 libxrender1 libxvidcore4 libzvbi0

sudo apt-get install gcc libpq-dev -y
sudo apt-get install python-dev  python-pip -y

sudo apt-get install -y libatomic-ops-dev python3-testresources
sudo apt install -y git
sudo apt-get install build-essential openjdk-11-jdk python zip unzip libgirepository1.0-dev
sudo apt-get install pkg-config libhdf5-dev libhdf5-hl-100
sudo apt install libopenblas-dev libblas-dev m4 cmake python3-dev python3-yaml python3-setuptools
1 Like

Great! Thanks a bunch for sharing this :slight_smile:

@rliaw @sangcho , Thank Yoy for inspiration. Maybe in the futer will be available to install ray directly with pip install command. The idea of using control system with rapberry pi devices (with digital and analog converters and hardware modules) as workers and PC as head is very attractive in my opinion. No need to convert data between raspberry and PC because raspberry becomes part of ML cluster.
The huge advantage is possibility to use easily data from real physical process in real-time training and process modeling.