What will happen if a dataset is sent to gpu which may not have enough mem?

Li_Bin · March 24, 2023, 12:59am

Suppose a trained model is used for online inference and I have a gpu with 12G memory.

During the infering for the first image which is using ,say, 6G memory, now a new image comes into play and we can not know its exact size ahead.

if the size is ,say , 3G, then it is ok and the GPU can room both images to infer simultaneously.
but if the new image has a size of 8G, apparently it is bigger than the gpu’s free memory.

My question is , for the case 2, the inference will be scheduled for waiting until enough memory is avaliable or just trigger the oom and manage to run anyway by retry mechanism?

Many thanks.

Jules_Damji · March 24, 2023, 7:11am

@Li_Bin oom monitor, I believe, may kill the first job it it requires more memory should the inference job pass the memory threshold, and via retry mechanism may schedule it again, meanwhile freeing up memory.

cc: @ClarenceNg how does the oom work in the case 2 when dealing with GRAM memory?

cade · March 24, 2023, 5:20pm

We currently defer GPU memory management completely to the framework being used. So @Li_Bin it will cause a CUDA OOM unless the framework you’re using does something about.

Note that there are advanced techniques to get around these limitations! (Or as simple as getting a more beefy GPU for inference).

Jules_Damji · March 24, 2023, 5:39pm

Thanks @cade for your response.

Li_Bin · March 25, 2023, 1:58am

could you pls share more info on this ? T.I.A.

cade · March 28, 2023, 12:28am

Can you share more about your workload? Exactly what model? There are frameworks to do model-parallel inference but they come with significant R&D requirements and usually take a throughput hit.

Topic		Replies	Views
GPU Memory Aware Scheduling Ray Core	8	895	March 12, 2024
Workaround for GPU-workers non-equal memory consumption Ray Train	7	470	June 1, 2022
Requst can not be scheduled if the actor number is larger than the number of gpu Ray Core	6	288	March 23, 2023
Prefetch data to GPU in `map_batches` Ray Data	3	174	August 26, 2024
Ray inferencing not happening in streaming way	7	379	December 13, 2023

What will happen if a dataset is sent to gpu which may not have enough mem?

Related topics