How to use container in Runtime Environments?

rabraham · May 17, 2022, 6:51pm

Hi,
I’d like to use the ‘container’ option but I’m not sure how to make it work? The following code works when I run it on the command line

# tf1/main.py
import tensorflow as tf
import time
DEPLOY_TIME = time.time()
class Predictor:
    def __init__(self):
        print("Initializing Predictor")
        pass

    def work(self):
        return tf.__version__ + f"|{DEPLOY_TIME}"
        pass




if __name__ == "__main__":
    print("Deploy Time:" + str(DEPLOY_TIME))

    import ray
    import os
    ray.init(namespace='indexing')


    try:
        old = ray.get_actor("tf1")
        print("Killing TF1")
        ray.kill(old)
    except ValueError:
        print("Not Killing TF1 as it's not present")


    PredictorActor = ray.remote(Predictor)
    PredictorActor.options(name="tf1", lifetime="detached").remote()
    a = ray.get_actor("tf1")
    print("Named Actor Call")
    print(ray.get(a.work.remote()))

On the command line, it works with

 ray job submit --runtime-env-json='{"working_dir": "./", "pip": ["tensorflow==1.15"], "excludes": ["venv"]}' -- python main.py

Re: using a container.

I’ve tried the following code.

#Dockerfile

# set base image (host OS)
FROM python:3.6

# set the working directory in the container
WORKDIR /code

# copy the dependencies file to the working directory
COPY requirements.txt .

# install dependencies
RUN pip install -r requirements.txt

# copy the content of the local src directory to the working directory
COPY . .

# command to run on container start
CMD [ "python", "main.py" ]

I build an image as ‘tf1:latest’ and then try

ray job submit --runtime-env-json='{"container": {"image": "tf1:latest"}}' -- python main.py

My client calling code(which works when run on the command line but not when using a container)

#client
import ray

ray.init(namespace="indexing")

print("Ray Namespace")
print(ray.get_runtime_context().namespace)


print("In Pipeline Indexing Both")
a = ray.get_actor("tf1")
print(ray.get(a.work.remote()))

raulchen · May 18, 2022, 8:54am

Hi, did you see any error message?
Or could you upload your logs? You can enable debug log by setting env var “RAY_BACKEND_LOG_LEVEL=debug”, and upload this dir “/tmp/ray/session_latest/logs”.

GuyangSong · May 18, 2022, 4:27pm

Hey @rabraham Thanks for your feedback and glads to see you are trying to use container in Runtime Environments!

First, I should say I’m sorry that the doc of container is insufficient and makes container runtime environments hard to use. We will enhancement this part recently.

And about the container functionality, I should indicate the state:

We don’t support using container in “ray submit” now.
If you want to use this functionality, here are some requirements about your cluster environment:
- You should instead “podman” in your hosts of Ray cluster because we need to start containers by it. You should make sure “podman” could start containers in your hosts, like the command podman run --cgroup-manager=cgroupfs --network=host --pid=host --ipc=host --env-host ---entrypoint python docker.io/rayproject/ray:1.12.0
- Your image should include a python which version is same as your Ray cluster. And in the python environment, you should already installed ray which version is sam as your Ray cluster as well.

As what I say above, here is a example in my side:

Create a python 3.7.7 environment by pyenv:

pyenv install 3.7.7
pyenv virtualenv 3.7.7  ray-opensource-3.7.7
pyenv activate ray-opensource-3.7.7

Create a image from the ray image, the dockerfile:

FROM rayproject/ray:1.12.0
COPY pip.conf /home/ray/.pip/pip.conf
RUN /home/ray/anaconda3/bin/pip install -U pip
RUN /home/ray/anaconda3/bin/pip install tensorflow==1.15
USER root
CMD source /home/ray/.bashrc

Build image by podman(You could also use docker command):

podman build -t raytest/container:v1 .

Start a Ray cluster:

ray start --head

instead tensorflow for driver(your main.py):

pip install tensorflow==1.15

Start your driver, the code:

# main_container.py
import tensorflow as tf
import time
DEPLOY_TIME = time.time()
class Predictor:
    def __init__(self):
        print("Initializing Predictor")
        pass

    def work(self):
        return tf.__version__ + f"|{DEPLOY_TIME}"
        pass


if __name__ == "__main__":
    print("Deploy Time:" + str(DEPLOY_TIME))

    import ray
    import os
    runtime_env={
        "container": {
            "image": "localhost/raytest/container:v1",
            #"worker_path": "/home/ray/anaconda3/lib/python3.7/site-packages/ray/workers/default_worker.py",
            "run_options": ["--env PATH=/home/ray/anaconda3/bin:$PATH", "-v /root/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7:/root/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7"],
        }
    }
    ray.init(namespace='indexing', address='auto', runtime_env=runtime_env)

    try:
        old = ray.get_actor("tf1")
        print("Killing TF1")
        ray.kill(old)
    except ValueError:
        print("Not Killing TF1 as it's not present")


    PredictorActor = ray.remote(Predictor)
    PredictorActor.options(name="tf1", lifetime="detached").remote()
    a = ray.get_actor("tf1")
    print("Named Actor Call")
    print(ray.get(a.work.remote())

You should change the “/root/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7” part by your python environment. It seems hacky because I found the worker_path param didn’t work in the latest version(1.12.0). I have created an issue

python main_container.py

It works in my side. Can you try it in your side? We will support it in “ray submit” in future version.

rabraham · May 19, 2022, 12:19pm

Thank you very much! @GuyangSong . I’ll try this as soon as I get a chance and let you know.

rabraham · May 20, 2022, 9:31pm

Hi @GuyangSong
I’m unable to get podman running on Ubuntu 20.04 LTS

Unfortunately, my first task is to get this working for dev environments and most devs are on 20.04 LTS. I’m not sure what to do at this point but I may have to halt my investigations into the container option for now :(. It’s a pity, because I was really excited about this feature.

If I’m able to get back to this, I’ll let you know but I just wanted to give you an update.

GuyangSong · May 23, 2022, 10:42am

@rabraham Can you show your error message or say anymore about your podman test? Is your host a container or VM or physical machine? Maybe your account doesn’t have permission to start container?

Wish to help you a lot . And we should do some thing to enhance this part if we can get your case‘s root cause.

rabraham · May 24, 2022, 8:20pm

our use case(for now) is to run ray on dev laptops. For that I have to get podman installed on our laptops which are on Ubuntu 20.04? I’m having difficulty getting podman installed on that Ubuntu version. For later versions of Ubuntu, the existing podman installation instructions(e.g. brew install) work.

At the risk of my fiance leaving me, I worked on Friday night on getting podman somewhat working on Ubuntu 20.04 :P. Let me try some more and I’ll paste my problem but I don’t know if it’s your headache to make podman work on Ubuntu 20.04.

rabraham · May 24, 2022, 8:26pm

there are a few typos in that command? I think the correct is the one below

podman run --cgroup-manager=cgroupfs --network=host --pid=host --ipc=host --env=host --entrypoint python docker.io/rayproject/ray:1.12.0

rabraham · May 24, 2022, 8:45pm

I had to build podman from scratch… though I build it with

make BUILDTAGS="exclude_graphdriver_devicemapper seccomp selinux"
sudo make install

I get this error:

❯ podman run --cgroup-manager=cgroupfs --network=host --pid=host --ipc=host --env=host --entrypoint python docker.io/rayproject/ray:1.12.0
Error: loading seccomp profile () failed: seccomp not enabled in this build

GuyangSong · May 26, 2022, 9:47am

--env-host works in my side. It means inherit env variable from host. But you don’t need to use it manually. It will be used in Ray. This command is only used to verify your environment.

rabraham · May 29, 2022, 2:33am

I upgraded to Ubuntu 22.04 and got podman installed. However when I run the commands below, I get the following error.

❯ sudo ln -s /usr/bin/python3 /usr/bin/python
❯ which python
/usr/bin/python
❯ python
Python 3.10.4 (main, Apr  2 2022, 09:04:19) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()
❯  podman run --cgroup-manager=cgroupfs --network=host --pid=host --ipc=host --env-host --entrypoint python docker.io/rayproject/ray:1.12.0
Error: executable file `python` not found in $PATH: No such file or directory: OCI runtime attempted to invoke a command that was not found

I’m guessing it’s looking for python in the host container? because when I go into the docker image, I see

❯ podman run -it docker.io/rayproject/ray:1.12.0 /bin/bash
(base) ray@b0e5ee97861d:~$ python
Python 3.7.7 (default, May  7 2020, 21:25:33) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()
(base) ray@b0e5ee97861d:~$

GuyangSong · May 30, 2022, 3:06am

You met this error because --env-host broke your environment variables.

Can you try what I said above?

Try to build a new image:

FROM rayproject/ray:1.12.0
COPY pip.conf /home/ray/.pip/pip.conf
RUN /home/ray/anaconda3/bin/pip install -U pip
RUN /home/ray/anaconda3/bin/pip install tensorflow==1.15
USER root
CMD source /home/ray/.bashrc

rabraham · May 31, 2022, 9:37am

Thanks @GuyangSong
Is this what you wanted me to try? My Dockerfile is the same as you wrote above.

FROM rayproject/ray:1.12.0

COPY pip.conf /home/ray/.pip/pip.conf

RUN /home/ray/anaconda3/bin/pip install -U pip
RUN /home/ray/anaconda3/bin/pip install tensorflow==1.15
USER root
CMD source /home/ray/.bashrc

> cd tf1
> podman build -t  tf1 .  
...
Successfully tagged localhost/tf1:latest

> podman run --cgroup-manager=cgroupfs --network=host --pid=host --ipc=host --env-host --entrypoint python localhost/tf1:latest

I still get the same error.

Error: executable file `python` not found in $PATH: No such file or directory: OCI runtime attempted to invoke a command that was not found

GuyangSong · May 31, 2022, 10:06am

@rabraham Please try to start the ray cluster and run the driver directly. You should modify the driver code like what I pasted. I think your podman command is ok and you don’t need to run the podman command at this time.

rabraham · June 1, 2022, 1:48am

Hi @GuyangSong

I’m getting the following error.


❯ export RAY_ADDRESS="http://127.0.0.1:8265"  
❯ pwd                                                                                                    ─╯
/home/rajiv/Documents/dev/bht/wdml/steps/tf1
❯ which python                                                                                           ─╯
/home/rajiv/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7/bin/python
❯ ray stop --force && ray start --head                                                                   ─╯
Stopped all 7 Ray processes.
...
❯ python main_container.py                                                                                                                                                                                                  ─╯
Deploy Time:1654047748.5368917
Traceback (most recent call last):
  File "main_container.py", line 66, in <module>
    ray.init(namespace='indexing', address='auto', runtime_env=runtime_env)
  File "/home/rajiv/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/rajiv/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7/lib/python3.7/site-packages/ray/worker.py", line 864, in init
    builder = ray.client(address, _deprecation_warn_enabled=False)
  File "/home/rajiv/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7/lib/python3.7/site-packages/ray/client_builder.py", line 367, in client
    builder = _get_builder_from_address(address)
  File "/home/rajiv/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7/lib/python3.7/site-packages/ray/client_builder.py", line 337, in _get_builder_from_address
    ), f"Module: {module_string} does not have ClientBuilder."
AssertionError: Module: http does not have ClientBuilder.
❯ which ray                                                                                                                                                                                                                 ─╯
/home/rajiv/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7/bin/ray
❯ pip freeze | grep ray                                                                                                                                                                                                     
ray==1.12.0

#main_container.py
import time
DEPLOY_TIME = time.time()
class Predictor:
    def __init__(self):
        print("Initializing Predictor")
        pass

    def work(self):
        return tf.__version__ + f"|{DEPLOY_TIME}"
        pass


if __name__ == "__main__":
    print("Deploy Time:" + str(DEPLOY_TIME))

    import ray
    import os
    runtime_env={
        "container": {
            "image": "localhost/raytest/container:v1",
            #"worker_path": "/home/ray/anaconda3/lib/python3.7/site-packages/ray/workers/default_worker.py",
            # "run_options": ["--env PATH=/home/ray/anaconda3/bin:$PATH", "-v /root/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7:/root/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7"],
            "run_options": ["--env PATH=/home/ray/anaconda3/bin:$PATH", "-v  /home/rajiv/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7:/root/.pyenv/versions/3.7.7/envs/ray-opensource-3.7.7"],
        }
    }
    ray.init(namespace='indexing', address='auto', runtime_env=runtime_env)

    try:
        old = ray.get_actor("tf1")
        print("Killing TF1")
        ray.kill(old)
    except ValueError:
        print("Not Killing TF1 as it's not present")


    PredictorActor = ray.remote(Predictor)
    PredictorActor.options(name="tf1", lifetime="detached").remote()
    a = ray.get_actor("tf1")
    print("Named Actor Call")
    print(ray.get(a.work.remote()))

rabraham · June 1, 2022, 1:53am

I had a question. How do we know that this code is running in a docker container and not in the virtual environment and terminal that I run python main_container.py in ?

rabraham · June 1, 2022, 12:44pm

I’m not sure if it matters but I had to pin protobuf~=3.19.0 as I think there was a regression lately.

yic · June 1, 2022, 8:03pm

@rabraham yes, there is an issue with protobuf recently and it’s discussed in this thread.

GuyangSong · June 6, 2022, 10:22am

Sorry for my late reply. You could see the logs.
Open the debug level log before you start ray nodes:

export RAY_BACKEND_LOG_LEVEL=debug

And try to see the logs about podman:

grep podman -r /tmp/ray/session_latest/logs/

GuyangSong · June 6, 2022, 10:23am

What’s your state now?

Topic		Replies	Views
How does "container" in "runtime_env" work? Ray Core	7	920	June 2, 2023
Runtime_env fails when running Ray in Docker Ray Core	8	1990	April 6, 2022
Runtime env docker image: Check failed: !job_id_env.empty() error Ray Core	9	485	June 12, 2023
[Medium] Using docker image for service deployment Ray Serve	7	860	December 29, 2023
Use runtime_env docker got Check failed: !job_id_env.empty() Ray Core	3	466	May 16, 2023

How to use container in Runtime Environments?

Related topics