How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
I am getting started with Ray and want to use it for scaling the training of my PyTorch neural network. Before using Ray with PyTorch Network class, i.e., nn.Module
., I want to make sure I can use Ray with simple tensor. Therefore, please see the snippet below:
import ray
import torch
ray.init()
@ray.remote
class Counter(object):
def __init__(self):
self.tensor = torch.ones((1, 3))
self.device = "cuda:0"
def move_and_increment(self):
self.tensor.to(self.device)
self.tensor += 1
def print(self):
return self.tensor
print(f"torch.cuda.is_available(): {torch.cuda.is_available()}")
counters = [Counter.remote() for i in range(2)]
[c.move_and_increment.remote() for c in counters]
futures = [c.print.remote() for c in counters]
print(ray.get(futures))
ray.shutdown()
I tried running it but failed miserably. Please see below the reported error:
$ python ray_test.py
2022-09-29 23:22:25,112 INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
torch.cuda.is_available(): True
2022-09-29 23:22:26,434 ERROR worker.py:399 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::Counter.move_and_increment() (pid=6008, ip=192.168.10.5, repr=<ray_test.Counter object at 0x7fa39c2db490>)
File "/home/ravi/learning_ray/ray_test.py", line 14, in move_and_increment
self.tensor.to(self.device)
File "/home/ravi/anaconda/envs/ray/lib/python3.9/site-packages/torch/cuda/__init__.py", line 172, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
2022-09-29 23:22:26,474 ERROR worker.py:399 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::Counter.move_and_increment() (pid=6006, ip=192.168.10.5, repr=<ray_test.Counter object at 0x7f9fe462b520>)
File "/home/ravi/learning_ray/ray_test.py", line 14, in move_and_increment
self.tensor.to(self.device)
File "/home/ravi/anaconda/envs/ray/lib/python3.9/site-packages/torch/cuda/__init__.py", line 172, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
[tensor([[1., 1., 1.]]), tensor([[1., 1., 1.]])]
I have enough memory in my graphic card as shown below:
$ nvidia-smi
Thu Sep 29 23:26:14 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... On | 00000000:01:00.0 Off | N/A |
| N/A 45C P8 7W / N/A | 443MiB / 7982MiB | 6% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1529 G /usr/lib/xorg/Xorg 45MiB |
| 0 N/A N/A 2812 G /usr/lib/xorg/Xorg 161MiB |
| 0 N/A N/A 2987 G /usr/bin/gnome-shell 105MiB |
| 0 N/A N/A 6112 G ...252301518872410907,131072 80MiB |
| 0 N/A N/A 8280 G ...RendererForSitePerProcess 36MiB |
+-----------------------------------------------------------------------------+
I think the error is related to lazy initialization. Can you please confirm and provide a way to fix it?