HuggingFacePredictor Multi-GPU

Is there any advice on how to get a HuggingFacePredictor to run on multiple gpus? I tested on a single node with 1 vs 2 gpus and they ran at the same speed.

I’m using Facebook’s Zero Shot model in the HuggingFacePredictor.

Pytorch detects both gpus.
I build the Ray Cluster with 2 gpus.
And I set ‘num_gpus_per_worker’ to 2 in the HuggingFacePredictor when calling ‘predict’.

cc @Yard1 HuggingFacePredictor Multi-GPU

In order to make the predictor use both GPUs, pass device_map="auto" as kwargs to from_checkpoint call. Keep in mind that this will not improve the speed of prediction (in fact, it will most likely make it worse) as it will simply split the model layers across multiple GPUs (naive pipeline parallelism). This would allow you to infer with a large model, but it has no benefits if you can already do inference with a single GPU.

Understood. I assumed incorrectly that it would parallel process with all available gpus, not shard the model.