How to specify GPU resources in terms of GPU RAM and not fraction of GPU

joseph-long · November 13, 2021, 10:01pm

I was trying to use the cluster resources feature to tell each node how much of a custom resource gpu_memory_mb it had, and then schedule my tasks based on their GPU RAM needs (same way I’m using memory= for regular RAM). However, I seem to have to give each task a num_gpus option > 0.0 for it to see the GPU at all.

I’ve been sticking in num_gpus=0.1, assuming that the GPU RAM resource specification will be the only one that matters, but it feels very hacky. I also have to specify the GPU RAM totals manually when launching the worker. Any chance this would be something Ray could support in the core API, similarly to the memory= option?

(The use case here is that I have limited access to a small number of GPUs with 32GB RAM, and easier access to some 16 GB GPUs. I’d like to tie them all together in a Ray cluster, but the GPU fraction required depends on which kind of node we’re talking about. Related to Gpu wise memory allocation )

sangcho · November 23, 2021, 3:29pm

However, I seem to have to give each task a num_gpus option > 0.0 for it to see the GPU at all.

Can you tell me a bit more about what this means? What do you mean by “see the GPU at all”?

joseph-long · November 23, 2021, 3:49pm

To be honest, I’m not sure how it’s happening myself, but doing e.g. a cupy call will raise an exception indicating there are no cuda devices available.

sangcho · November 26, 2021, 3:14pm

I see. I guess it might be because CUDA_VISIBLE_DEVICES is probably not properly set. According to GPU Support — Ray v2.0.0.dev0, it seems like Ray automatically sets this env var when num_gpus is specified.

@rliaw is there usually a recommended way to use Ray with GPU memory configuration. Should he always specify small fraction of num_gpus?

Topic		Replies	Views
GPU Memory Aware Scheduling Ray Core	8	942	March 12, 2024
Automatic calculation of a value for the `num_gpu` param Ray Core	4	934	December 2, 2022
Gpu wise memory allocation Ray Tune	0	448	December 16, 2020
Ray cluster details doesn't show requested number of gpus Kubernetes	3	193	June 19, 2024
Spread accross several fractional GPUs or 1< num_gpus < 2 Ray Core	1	345	February 13, 2024

How to specify GPU resources in terms of GPU RAM and not fraction of GPU

Related topics