Ray Serve container runtime_env cannot use GPU

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am trying to run a ray serve deployment with docker containers, but I am not able to get the replicas to use gpu. I am running on a local setup with 1 gpu.

docker-compose file for ray head:

        image: ray_head
        container_name: ray_head
            context: ./ray_pipeline/ray_head
            dockerfile: Dockerfile
        command: ray start --head --dashboard-host= --block
            - 8265:8265     # ray dashboard
            - 8000:8000     # ray serve
            - 10001:10001   # ray client
        restart: always
        privileged: true
            - /dev/shm:/dev/shm
            - /var/lib/containers:/var/lib/containers
            - NVIDIA_VISIBLE_DEVICES=all
            - ray_network

Snippet for ray serve deployment:

import ray

runtime_env = {
    "container": {
        "image": "detector:latest",
        "run_options": ["--gpus all", "-v /dev/shm:/dev/shm", "--privileged", "--log-level=debug"]
@serve.deployment(num_replicas=1, ray_actor_options={"num_cpus": 1, "num_gpus": 0.25, "runtime_env": runtime_env})
class RayDetector(object):
    def __init__(self):
        import os
        import torch
        print(f'# ray.get_gpu_ids(): {ray.get_gpu_ids()}')
        print(f'# os.environ["CUDA_VISIBLE_DEVICES"]: {os.environ["CUDA_VISIBLE_DEVICES"]}')
        print(f'# torch.cuda.is_available(): {torch.cuda.is_available()}')
        # initialize object detector

Snippet from main code:

import ray
from ray import serve

serve.start(detached=True, http_options={"host": ""})

ray_detector_handle = serve.get_deployment('RayDetector').get_handle()

When I run the above code, I get the below:

ray.get_gpu_ids(): [0]
os.environ["CUDA_VISIBLE_DEVICES"]: 0
torch.cuda.is_available(): False

What could be the issue here? It seems that the gpu device is properly detected, but pytorch is not able to use it.

Is there any chance this is an issue with the Docker container itself? What do you see if you run the container and manually check torch.cuda.is_available() in the Python interpreter?

I have solved this by installing nvidia-docker2 inside the ray head container.

Now I am running into another problem. On the dashboard ‘cluster’ page, I am not able to see the status (e.g. pending tasks, object refs in scope) of the replicas that are spinned up using podman.

Could you please tell me, did you manage to pull up the service image and run the service in a container this way?