Let’s say, I have 2 GPUs with following specs:
- 8GB memory
- 24GB memory
I have a trained model which takes at most 4 GB of GPU during inference.
Since the infer function needs to be executed over GPU, I must declare “num_gpus”. If I do not declare it, the process does not run over GPU and gives “no CUDA capable device detected.”
Practically, I have total GPU of 32 GB, which means I can run 8 inferences in parallel.
How should the @ray.remote decorator for such a function should look like?
If I specify it as:
@ray.remote(num_gpus=0.5, resources={“GPUMemory”: 4})
def infer(…):
…
It will be a best fit case for 8GB GPU, but my other 24GB GPU will be under utilized, as this will allocate 12GB GPU (out of 24GB, as per 0.5 num_gpus) for a process that is going to take just 4GB GPU.
Hope I’m able to explain.
TIA.