HuggingFacePredictor Multi-GPU

Eric_Bugin · May 11, 2023, 4:02pm

Is there any advice on how to get a HuggingFacePredictor to run on multiple gpus? I tested on a single node with 1 vs 2 gpus and they ran at the same speed.

I’m using Facebook’s Zero Shot model in the HuggingFacePredictor.

Pytorch detects both gpus.
I build the Ray Cluster with 2 gpus.
And I set ‘num_gpus_per_worker’ to 2 in the HuggingFacePredictor when calling ‘predict’.

cade · May 16, 2023, 9:16pm

cc @Yard1 HuggingFacePredictor Multi-GPU

Yard1 · June 8, 2023, 6:20am

In order to make the predictor use both GPUs, pass device_map="auto" as kwargs to from_checkpoint call. Keep in mind that this will not improve the speed of prediction (in fact, it will most likely make it worse) as it will simply split the model layers across multiple GPUs (naive pipeline parallelism). This would allow you to infer with a large model, but it has no benefits if you can already do inference with a single GPU.

Eric_Bugin · July 12, 2023, 12:55pm

Understood. I assumed incorrectly that it would parallel process with all available gpus, not shard the model.

Topic		Replies	Views
Transformers pipeline with ray does not work on gpu	0	887	September 8, 2023
Single node, 4x GPU, map_batches only using 1 Ray Data	3	688	October 5, 2023
Best practices to run multiple models in multiple GPUs in RayLLM Ray Train	0	717	February 8, 2024
[RAY SGD] Train pytorch model on machine with 2 GPUs Ray Tune	2	432	February 19, 2021
Serving LLM with multiple gpus Ray Serve	0	275	July 3, 2024

HuggingFacePredictor Multi-GPU

Related topics