How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
This question is a continuation of my previous question. I am learning Ray and thus decided to go through make a simple actor as shown below:
import ray
import torch
ray.init()
ray.cluster_resources()
@ray.remote(num_gpus=1)
class Counter(object):
def __init__(self):
self.tensor = torch.ones((1, 3))
self.device = "cuda:0"
def move_and_increment(self):
self.tensor.to(self.device)
self.tensor += 1
def print(self):
return self.tensor
print(f"torch.cuda.is_available(): {torch.cuda.is_available()}")
counters = [Counter.remote() for i in range(1)]
[c.move_and_increment.remote() for c in counters]
futures = [c.print.remote() for c in counters]
print(ray.get(futures))
ray.shutdown()
I have 1 Nvidia GeForce RTX 2080 (8GB Memory) and the above code works fine in it. I noticed that my simplest actor is consuming 1089MiB GPU memory as shown below:
$ nvidia-smi
Tue Oct 4 16:08:54 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... On | 00000000:01:00.0 On | N/A |
| N/A 50C P8 13W / N/A | 2513MiB / 7982MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1554 G /usr/lib/xorg/Xorg 160MiB |
| 0 N/A N/A 2820 G /usr/lib/xorg/Xorg 665MiB |
| 0 N/A N/A 3001 G /usr/bin/gnome-shell 105MiB |
| 0 N/A N/A 3614 G ...763400436228628087,131072 397MiB |
| 0 N/A N/A 39131 G ...RendererForSitePerProcess 78MiB |
| 0 N/A N/A 141097 C ...conda/envs/ray/bin/python 1089MiB |
+-----------------------------------------------------------------------------+
It turned out that most of this memory is consumed by CUDA context loading (kernels etc.). However, while using 2 actors with num_gpu=0.5
, the memory consumption is twice, as I can see two entities reported by nvidia-smi
. Please see below:
$ nvidia-smi
Tue Oct 4 16:13:30 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... On | 00000000:01:00.0 On | N/A |
| N/A 52C P0 28W / N/A | 3398MiB / 7982MiB | 20% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1554 G /usr/lib/xorg/Xorg 160MiB |
| 0 N/A N/A 2820 G /usr/lib/xorg/Xorg 688MiB |
| 0 N/A N/A 3001 G /usr/bin/gnome-shell 111MiB |
| 0 N/A N/A 3614 G ...763400436228628087,131072 172MiB |
| 0 N/A N/A 39131 G ...RendererForSitePerProcess 78MiB |
| 0 N/A N/A 143170 C ...nter.move_and_increment() 1087MiB |
| 0 N/A N/A 143171 C ...nter.move_and_increment() 1085MiB |
+-----------------------------------------------------------------------------+
Does it mean that ray is loading the CUDA context twice? GPU memory is most precious than anything else in the world!!!