GPU Memory Aware Scheduling

rohitgpt · July 21, 2021, 11:22am

Let’s say, I have 2 GPUs with following specs:

8GB memory
24GB memory

I have a trained model which takes at most 4 GB of GPU during inference.

Since the infer function needs to be executed over GPU, I must declare “num_gpus”. If I do not declare it, the process does not run over GPU and gives “no CUDA capable device detected.”

Practically, I have total GPU of 32 GB, which means I can run 8 inferences in parallel.

How should the @ray.remote decorator for such a function should look like?
If I specify it as:

@ray.remote(num_gpus=0.5, resources={“GPUMemory”: 4})
def infer(…):
…

It will be a best fit case for 8GB GPU, but my other 24GB GPU will be under utilized, as this will allocate 12GB GPU (out of 24GB, as per 0.5 num_gpus) for a process that is going to take just 4GB GPU.

Hope I’m able to explain.

TIA.

Topic		Replies	Views
How to specify GPU resources in terms of GPU RAM and not fraction of GPU Ray Core	3	564	November 26, 2021
Is it possible to run inference on local GPU as well as rollout CPU workers?	1	255	November 2, 2023
Specifying extra resources for functions (tasks) running inside an Actor? Ray Core	2	346	September 27, 2023
Ray Actor not utilising GPU Ray Core	7	148	November 6, 2024
What will happen if a dataset is sent to gpu which may not have enough mem? Ray Core	5	262	March 28, 2023

GPU Memory Aware Scheduling

Related topics