Hi, I surfed the entire web but couldn’t find the answer to my query. So, I’m posting it here.
I’m planning on using Ray for the purpose of deploying trained models in production pipelines. I know the upper bound of the GPU memory that my trained models are going to consume during inference.
Is it possible to specify the “GPU memory requirements” before-hand in @ray.remote in order to maximize the GPU utilization and throughput?
In case you’re going to suggest num_gpus: I have a couple of GPU specifications. For instance, there is a system with 2 GPUs of 8GB each, one with 1 GPU of 24GB, etc.
If inference is going to take, let’s say 4GB GPU, how do I specify it before-hand (GPU memory aware scheduling)?