How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
Let me start this post by saying GPU memory is most precious than anything else in the world! Next, this question is a continuation of my previous question. Please see my simple actor below:
import ray
import torch
ray.init()
ray.cluster_resources()
@ray.remote(num_gpus=0.5)
class Counter(object):
def __init__(self):
self.tensor = torch.ones((1, 3))
self.device = "cuda:0"
def move_and_increment(self):
self.tensor.to(self.device)
self.tensor += 1
def print(self):
return self.tensor
print(f"torch.cuda.is_available(): {torch.cuda.is_available()}")
counters = [Counter.remote() for i in range(2)]
[c.move_and_increment.remote() for c in counters]
futures = [c.print.remote() for c in counters]
print(ray.get(futures))
ray.shutdown()
I have 1 Nvidia GeForce RTX 2080 (8GB Memory) and the above code works fine in it. However, please notice the num_gpus=0.5
parameter in my actor. I have the following 2 questions about num_gpu
parameter:
- In my simple program, the actor and main function are in the same place. Furthermore, the number of actors is a handful in number. Both of these situations make it very easy to update the
num_gpus
parameter. But how do you edit this parameter (and others, saynum_cpu
, etc.) in a large project having multiple files? - Consider having an RTX 3090 having 24GB GPU memory. and a tiny tensor. In this case, if I allocate, use
num_gpu=1
(instead of 0.5) and run two actors. Shouldn’t ray automatically find free memory on the GPU and then allocate the second actor to the same GPU to save resources? Therefore, I can run a large number of actors in a GPU.
In summary, is there a way for automatic calculation of a value for the num_gpu
parameter?