Your Ray job is stuck in “pending” because the image_uri (container runtime environment) feature requires Podman to be installed and available on the host, and the Ray container must be able to launch nested containers using Podman. The Ray documentation and recent issues confirm that:
- Podman is required: Ray’s container runtime environment support is experimental and currently only works with Podman, not Docker, even if you set
RAY_RUNTIME_ENV_PODMAN_EXE=/usr/bin/docker(Ray multi-app container guide, Ray GitHub issue #37293). - Podman version: Your Podman version (3.4.4) is quite old; Ray recommends Podman 4.x+ for better compatibility (Ray multi-app container guide).
- Privileged container: If Ray is running inside a container, that container must be started with
--privilegedto allow Podman to launch nested containers (Ray multi-app container guide). - Job stuck in pending: This is a known symptom when Podman is missing, misconfigured, or not accessible to the Ray process (Ray GitHub issue #37293, Ray Discuss thread).
Summary:
You cannot use Docker as a drop-in replacement for Podman in Ray’s container runtime environment. You must install Podman (preferably v4.x+) on your EC2 instance, ensure it is in the PATH, and, if running Ray inside a container, start that container with --privileged. Only then will image_uri jobs work as expected.
Would you like more detail on troubleshooting steps or how to set up Podman for Ray?
Sources:
- Ray multi-app container guide: Troubleshooting
- Ray GitHub issue #37293
- Ray Discuss: Submitted containerized job is stuck in pending mode
Hint: Mention @RunLLM in the post for followups.