Does ray load the CUDA context multiple times?

ravi · October 11, 2022, 4:05am

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity

This question is a continuation of my previous question. I am learning Ray and thus decided to go through make a simple actor as shown below:

import ray
import torch

ray.init()
ray.cluster_resources() 

@ray.remote(num_gpus=1)
class Counter(object):
    def __init__(self):
        self.tensor = torch.ones((1, 3))
        self.device = "cuda:0"

    def move_and_increment(self):
        self.tensor.to(self.device)
        self.tensor += 1

    def print(self):
        return self.tensor


print(f"torch.cuda.is_available(): {torch.cuda.is_available()}")

counters = [Counter.remote() for i in range(1)]
[c.move_and_increment.remote() for c in counters]
futures = [c.print.remote() for c in counters]
print(ray.get(futures))

ray.shutdown()

I have 1 Nvidia GeForce RTX 2080 (8GB Memory) and the above code works fine in it. I noticed that my simplest actor is consuming 1089MiB GPU memory as shown below:

$ nvidia-smi 
Tue Oct  4 16:08:54 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   50C    P8    13W /  N/A |   2513MiB /  7982MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1554      G   /usr/lib/xorg/Xorg                160MiB |
|    0   N/A  N/A      2820      G   /usr/lib/xorg/Xorg                665MiB |
|    0   N/A  N/A      3001      G   /usr/bin/gnome-shell              105MiB |
|    0   N/A  N/A      3614      G   ...763400436228628087,131072      397MiB |
|    0   N/A  N/A     39131      G   ...RendererForSitePerProcess       78MiB |
|    0   N/A  N/A    141097      C   ...conda/envs/ray/bin/python     1089MiB |
+-----------------------------------------------------------------------------+

It turned out that most of this memory is consumed by CUDA context loading (kernels etc.). However, while using 2 actors with num_gpu=0.5, the memory consumption is twice, as I can see two entities reported by nvidia-smi . Please see below:

$ nvidia-smi 
Tue Oct  4 16:13:30 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   52C    P0    28W /  N/A |   3398MiB /  7982MiB |     20%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1554      G   /usr/lib/xorg/Xorg                160MiB |
|    0   N/A  N/A      2820      G   /usr/lib/xorg/Xorg                688MiB |
|    0   N/A  N/A      3001      G   /usr/bin/gnome-shell              111MiB |
|    0   N/A  N/A      3614      G   ...763400436228628087,131072      172MiB |
|    0   N/A  N/A     39131      G   ...RendererForSitePerProcess       78MiB |
|    0   N/A  N/A    143170      C   ...nter.move_and_increment()     1087MiB |
|    0   N/A  N/A    143171      C   ...nter.move_and_increment()     1085MiB |
+-----------------------------------------------------------------------------+

Does it mean that ray is loading the CUDA context twice? GPU memory is most precious than anything else in the world!!!

ravi · October 15, 2022, 2:31pm

Any comments on this issue, please?

Chen_Shen · October 15, 2022, 6:03pm

hi @ravi,
Your understanding is correct. if two actors are both scheduled on the same GPU, because these are two separate processes, Ray doesn’t share the CUDA context between them as of today.

ravi · October 16, 2022, 10:47am

Thanks @Chen_Shen for the confirmation.

Topic		Replies	Views
Creation of multiple actors on a single GPU in Ray leads to multiple Cuda context loading, causing increased memory usage and slower speed Ray Core	2	451	June 26, 2023
[Ray Core] RuntimeError: No CUDA GPUs are available Ray Core	5	5036	October 15, 2022
How to reduce GPU memory consumption overhead of actor workers Ray Core	2	402	May 27, 2021
Pipeline with no ray.get and a memory leak Ray Core	5	749	April 8, 2021
How do Ray actors share a GPU? Ray Core	2	2420	December 15, 2021

Does ray load the CUDA context multiple times?

Related topics