Is there any advice on how to get a HuggingFacePredictor to run on multiple gpus? I tested on a single node with 1 vs 2 gpus and they ran at the same speed.
I’m using Facebook’s Zero Shot model in the HuggingFacePredictor.
Pytorch detects both gpus.
I build the Ray Cluster with 2 gpus.
And I set ‘num_gpus_per_worker’ to 2 in the HuggingFacePredictor when calling ‘predict’.
In order to make the predictor use both GPUs, pass
device_map="auto" as kwargs to
from_checkpoint call. Keep in mind that this will not improve the speed of prediction (in fact, it will most likely make it worse) as it will simply split the model layers across multiple GPUs (naive pipeline parallelism). This would allow you to infer with a large model, but it has no benefits if you can already do inference with a single GPU.
Understood. I assumed incorrectly that it would parallel process with all available gpus, not shard the model.