GPU Memory Aware Scheduling

Let’s say, I have 2 GPUs with following specs:

  1. 8GB memory
  2. 24GB memory

I have a trained model which takes at most 4 GB of GPU during inference.

Since the infer function needs to be executed over GPU, I must declare “num_gpus”. If I do not declare it, the process does not run over GPU and gives “no CUDA capable device detected.”

Practically, I have total GPU of 32 GB, which means I can run 8 inferences in parallel.

How should the @ray.remote decorator for such a function should look like?
If I specify it as:

@ray.remote(num_gpus=0.5, resources={“GPUMemory”: 4})
def infer(…):

It will be a best fit case for 8GB GPU, but my other 24GB GPU will be under utilized, as this will allocate 12GB GPU (out of 24GB, as per 0.5 num_gpus) for a process that is going to take just 4GB GPU.

Hope I’m able to explain.

TIA.