Unable to pass Docker image to run to Ray Job Submit

I pulled this image - rayproject/ray:2.51.0.801bd7-extra-py310 from dockerhub with podman (ray prefers podman internally).

On AWS-EC2 then I started a head instance with - Deep Learning Base AMI with Single CUDA (Ubuntu 22.04), ami-06eff6f62c23006e9 (64-bit (x86))

On the instance, I have:

ubuntu@ip-172-31-15-118:~$ python --version
Python 3.10.19
ubuntu@ip-172-31-15-118:~$ python3 --version
Python 3.10.19
ubuntu@ip-172-31-15-118:~$ docker --version
Docker version 28.5.1, build e180ab8
ubuntu@ip-172-31-15-118:~$ podman --version
podman version 3.4.4


python3 -m venv ~/raycli-env
source ~/raycli-env/bin/activate
pip install -U "ray[default]"

(raycli-env) ubuntu@ip-172-31-15-118:~$ ray --version
ray, version 2.51.0

I have exported the following environment variables also:

export RAY_RUNTIME_ENV_DOCKER=1
export RAY_RUNTIME_ENV_PODMAN_EXE=/usr/bin/docker
export PATH=/usr/bin:$PATH

And I have started the ray cluster with:

ray start --head --port=6379 --dashboard-host=0.0.0.0

Then I have done:

mkdir ray-job
cd ray-job

echo 'import ray
ray.init(address="auto")
@ray.remote
def hello():
    return "Hello World from Ray on GPU!"
print(ray.get(hello.remote()))' > hello_ray.py

Then I extended the image with the following Dockerfile:

ARG RAY_UID=1000
ARG RAY_GID=100

FROM rayproject/ray:2.51.0.801bd7-extra-py310

USER root
RUN pip install "ray[default]"

# go back to ray user for running workloads
USER ray
WORKDIR /home/ray

CMD ["python", "hello_ray.py"]

And built the image with:

podman build -t ray-image:latest

But when I run, the following command, I don’t get any errors, and the job stays in pending forever.

ray job submit \
  --address="http://127.0.0.1:8265" \
  --runtime-env-json '{"image_uri": "localhost/ray-image:latest"}' \
  -- python hello_ray.py

But when I run this with python-slim with python versions matched, I am able to successfully submit the job.

I don’t know what I am doing wrong. The python and ray versions on EC2 instance and the podman image are same.
Please help me. I have been stucked here for so long.

Your Ray job is stuck in “pending” because the image_uri (container runtime environment) feature requires Podman to be installed and available on the host, and the Ray container must be able to launch nested containers using Podman. The Ray documentation and recent issues confirm that:

Summary:
You cannot use Docker as a drop-in replacement for Podman in Ray’s container runtime environment. You must install Podman (preferably v4.x+) on your EC2 instance, ensure it is in the PATH, and, if running Ray inside a container, start that container with --privileged. Only then will image_uri jobs work as expected.

Would you like more detail on troubleshooting steps or how to set up Podman for Ray?

Sources:

Hint: Mention @RunLLM in the post for followups.