We are using Ray Datasets to transform a set of PDF files, extract information from them, perform some NLP stuff like classification, Named Entity Recognition, generate embedding etc. It’s a pipeline that transforms the dataset, adding columns to it as each row passes through it. At least three steps in the pipeline use GPU for inference. The pipeline is deployed as a Ray Serve based service with FastAPI ingress. The pipeline runs in a spate actor asynchronously though from the main ingress process. The FASTAPI deployment is used to kick off the pipeline on a bunch of files in a GCS bucket.
What we are seeing is that the when we run the while deployment directly on a VM, it used the GPU just fine. But as soon as we dockerize it and run it in a container, although the GPU is available inside the container to Pytorch, and we have given the num_gpus = 1 to Ray start, the GPU is not being used properly. We see some actors on the GPU with nvidia-smi command but the utilization remains 0% throughout the running of the pipeline. This is hampering us from putting this deployment in Kubernetes or otherwise in Docker containers.
Another thing to note is that if we don’t use Ray Datasets and directly run similar code with GPUs in actors, it works fine within containers.
We are using Ray version 2.4.0