Hey @rabraham Thanks for your feedback and glads to see you are trying to use container in Runtime Environments!
First, I should say I’m sorry that the doc of container is insufficient and makes container runtime environments hard to use. We will enhancement this part recently.
And about the container functionality, I should indicate the state:
- We don’t support using container in “ray submit” now.
- If you want to use this functionality, here are some requirements about your cluster environment:
- You should instead “podman” in your hosts of Ray cluster because we need to start containers by it. You should make sure “podman” could start containers in your hosts, like the command
podman run --cgroup-manager=cgroupfs --network=host --pid=host --ipc=host --env-host ---entrypoint python docker.io/rayproject/ray:1.12.0
- Your image should include a python which version is same as your Ray cluster. And in the python environment, you should already installed ray which version is sam as your Ray cluster as well.
- You should instead “podman” in your hosts of Ray cluster because we need to start containers by it. You should make sure “podman” could start containers in your hosts, like the command
As what I say above, here is a example in my side:
Create a python 3.7.7 environment by pyenv
:
pyenv install 3.7.7
pyenv virtualenv 3.7.7 ray-opensource-3.7.7
pyenv activate ray-opensource-3.7.7
Create a image from the ray image, the dockerfile:
FROM rayproject/ray:1.12.0
COPY pip.conf /home/ray/.pip/pip.conf
RUN /home/ray/anaconda3/bin/pip install -U pip
RUN /home/ray/anaconda3/bin/pip install tensorflow==1.15
USER root
CMD source /home/ray/.bashrc
Build image by podman
(You could also use docker command):
podman build -t raytest/container:v1 .
Start a Ray cluster:
ray start --head
instead tensorflow for driver(your main.py):
pip install tensorflow==1.15
Start your driver, the code:
# main_container.py
import tensorflow as tf
import time
DEPLOY_TIME = time.time()
class Predictor:
def __init__(self):
print("Initializing Predictor")
pass
def work(self):
return tf.__version__ + f"|{DEPLOY_TIME}"
pass
if __name__ == "__main__":
print("Deploy Time:" + str(DEPLOY_TIME))
import ray
import os
runtime_env={
"container": {
"image": "localhost/raytest/container:v1",
#"worker_path": "/home/ray/anaconda3/lib/python3.7/site-packages/ray/workers/default_worker.py",
"run_options": ["--env PATH=/home/ray/anaconda3/bin:$PATH", "-v /root/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7:/root/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7"],
}
}
ray.init(namespace='indexing', address='auto', runtime_env=runtime_env)
try:
old = ray.get_actor("tf1")
print("Killing TF1")
ray.kill(old)
except ValueError:
print("Not Killing TF1 as it's not present")
PredictorActor = ray.remote(Predictor)
PredictorActor.options(name="tf1", lifetime="detached").remote()
a = ray.get_actor("tf1")
print("Named Actor Call")
print(ray.get(a.work.remote())
You should change the “/root/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7” part by your python environment. It seems hacky because I found the worker_path
param didn’t work in the latest version(1.12.0). I have created an issue
python main_container.py
It works in my side. Can you try it in your side? We will support it in “ray submit” in future version.