How to use container in Runtime Environments?

Hey @rabraham Thanks for your feedback and glads to see you are trying to use container in Runtime Environments!

First, I should say I’m sorry that the doc of container is insufficient and makes container runtime environments hard to use. We will enhancement this part recently.

And about the container functionality, I should indicate the state:

  • We don’t support using container in “ray submit” now.
  • If you want to use this functionality, here are some requirements about your cluster environment:
    • You should instead “podman” in your hosts of Ray cluster because we need to start containers by it. You should make sure “podman” could start containers in your hosts, like the command podman run --cgroup-manager=cgroupfs --network=host --pid=host --ipc=host --env-host ---entrypoint python docker.io/rayproject/ray:1.12.0
    • Your image should include a python which version is same as your Ray cluster. And in the python environment, you should already installed ray which version is sam as your Ray cluster as well.

As what I say above, here is a example in my side:

Create a python 3.7.7 environment by pyenv:

pyenv install 3.7.7
pyenv virtualenv 3.7.7  ray-opensource-3.7.7
pyenv activate ray-opensource-3.7.7

Create a image from the ray image, the dockerfile:

FROM rayproject/ray:1.12.0
COPY pip.conf /home/ray/.pip/pip.conf
RUN /home/ray/anaconda3/bin/pip install -U pip
RUN /home/ray/anaconda3/bin/pip install tensorflow==1.15
USER root
CMD source /home/ray/.bashrc

Build image by podman(You could also use docker command):

podman build -t raytest/container:v1 .

Start a Ray cluster:

ray start --head

instead tensorflow for driver(your main.py):

pip install tensorflow==1.15

Start your driver, the code:

# main_container.py
import tensorflow as tf
import time
DEPLOY_TIME = time.time()
class Predictor:
    def __init__(self):
        print("Initializing Predictor")
        pass

    def work(self):
        return tf.__version__ + f"|{DEPLOY_TIME}"
        pass


if __name__ == "__main__":
    print("Deploy Time:" + str(DEPLOY_TIME))

    import ray
    import os
    runtime_env={
        "container": {
            "image": "localhost/raytest/container:v1",
            #"worker_path": "/home/ray/anaconda3/lib/python3.7/site-packages/ray/workers/default_worker.py",
            "run_options": ["--env PATH=/home/ray/anaconda3/bin:$PATH", "-v /root/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7:/root/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7"],
        }
    }
    ray.init(namespace='indexing', address='auto', runtime_env=runtime_env)

    try:
        old = ray.get_actor("tf1")
        print("Killing TF1")
        ray.kill(old)
    except ValueError:
        print("Not Killing TF1 as it's not present")


    PredictorActor = ray.remote(Predictor)
    PredictorActor.options(name="tf1", lifetime="detached").remote()
    a = ray.get_actor("tf1")
    print("Named Actor Call")
    print(ray.get(a.work.remote())

You should change the “/root/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7” part by your python environment. It seems hacky because I found the worker_path param didn’t work in the latest version(1.12.0). I have created an issue

python main_container.py

It works in my side. Can you try it in your side? We will support it in “ray submit” in future version.