Ray Serve container runtime_env cannot use GPU

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am trying to run a ray serve deployment with docker containers, but I am not able to get the replicas to use gpu. I am running on a local setup with 1 gpu.

docker-compose file for ray head:

        image: ray_head
        container_name: ray_head
            context: ./ray_pipeline/ray_head
            dockerfile: Dockerfile
        command: ray start --head --dashboard-host= --block
            - 8265:8265     # ray dashboard
            - 8000:8000     # ray serve
            - 10001:10001   # ray client
        restart: always
        privileged: true
            - /dev/shm:/dev/shm
            - /var/lib/containers:/var/lib/containers
            - NVIDIA_VISIBLE_DEVICES=all
            - ray_network

Snippet for ray serve deployment:

import ray

runtime_env = {
    "container": {
        "image": "detector:latest",
        "run_options": ["--gpus all", "-v /dev/shm:/dev/shm", "--privileged", "--log-level=debug"]
@serve.deployment(num_replicas=1, ray_actor_options={"num_cpus": 1, "num_gpus": 0.25, "runtime_env": runtime_env})
class RayDetector(object):
    def __init__(self):
        import os
        import torch
        print(f'# ray.get_gpu_ids(): {ray.get_gpu_ids()}')
        print(f'# os.environ["CUDA_VISIBLE_DEVICES"]: {os.environ["CUDA_VISIBLE_DEVICES"]}')
        print(f'# torch.cuda.is_available(): {torch.cuda.is_available()}')
        # initialize object detector

Snippet from main code:

import ray
from ray import serve

serve.start(detached=True, http_options={"host": ""})

ray_detector_handle = serve.get_deployment('RayDetector').get_handle()

When I run the above code, I get the below:

ray.get_gpu_ids(): [0]
os.environ["CUDA_VISIBLE_DEVICES"]: 0
torch.cuda.is_available(): False

What could be the issue here? It seems that the gpu device is properly detected, but pytorch is not able to use it.