Does ray load the CUDA context multiple times?

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

This question is a continuation of my previous question. I am learning Ray and thus decided to go through make a simple actor as shown below:

import ray
import torch

ray.init()
ray.cluster_resources() 

@ray.remote(num_gpus=1)
class Counter(object):
    def __init__(self):
        self.tensor = torch.ones((1, 3))
        self.device = "cuda:0"

    def move_and_increment(self):
        self.tensor.to(self.device)
        self.tensor += 1

    def print(self):
        return self.tensor


print(f"torch.cuda.is_available(): {torch.cuda.is_available()}")

counters = [Counter.remote() for i in range(1)]
[c.move_and_increment.remote() for c in counters]
futures = [c.print.remote() for c in counters]
print(ray.get(futures))

ray.shutdown()

I have 1 Nvidia GeForce RTX 2080 (8GB Memory) and the above code works fine in it. I noticed that my simplest actor is consuming 1089MiB GPU memory as shown below:

$ nvidia-smi 
Tue Oct  4 16:08:54 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   50C    P8    13W /  N/A |   2513MiB /  7982MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1554      G   /usr/lib/xorg/Xorg                160MiB |
|    0   N/A  N/A      2820      G   /usr/lib/xorg/Xorg                665MiB |
|    0   N/A  N/A      3001      G   /usr/bin/gnome-shell              105MiB |
|    0   N/A  N/A      3614      G   ...763400436228628087,131072      397MiB |
|    0   N/A  N/A     39131      G   ...RendererForSitePerProcess       78MiB |
|    0   N/A  N/A    141097      C   ...conda/envs/ray/bin/python     1089MiB |
+-----------------------------------------------------------------------------+

It turned out that most of this memory is consumed by CUDA context loading (kernels etc.). However, while using 2 actors with num_gpu=0.5, the memory consumption is twice, as I can see two entities reported by nvidia-smi . Please see below:

$ nvidia-smi 
Tue Oct  4 16:13:30 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   52C    P0    28W /  N/A |   3398MiB /  7982MiB |     20%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1554      G   /usr/lib/xorg/Xorg                160MiB |
|    0   N/A  N/A      2820      G   /usr/lib/xorg/Xorg                688MiB |
|    0   N/A  N/A      3001      G   /usr/bin/gnome-shell              111MiB |
|    0   N/A  N/A      3614      G   ...763400436228628087,131072      172MiB |
|    0   N/A  N/A     39131      G   ...RendererForSitePerProcess       78MiB |
|    0   N/A  N/A    143170      C   ...nter.move_and_increment()     1087MiB |
|    0   N/A  N/A    143171      C   ...nter.move_and_increment()     1085MiB |
+-----------------------------------------------------------------------------+

Does it mean that ray is loading the CUDA context twice? GPU memory is most precious than anything else in the world!!!

Any comments on this issue, please?

hi @ravi,
Your understanding is correct. if two actors are both scheduled on the same GPU, because these are two separate processes, Ray doesn’t share the CUDA context between them as of today.

1 Like

Thanks @Chen_Shen for the confirmation.