Parallel inference using CPUs

xzf0kgb0bqr.cev2RWU · June 27, 2023, 8:13pm

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Main question

I have a trained model. I have one GPU node which has 4 GPUs and 40 CPUs. I wish to apply the model in parallel over the 40 CPU nodes to test_data (so each CPU gets 1/40 of the test_data). How can I do this?

More details
I would like to avoid using Ray AIR if possible for two reasons:

It is in beta testing.
I would need to convert my PyTorch DataLoader to a Ray AIR Dataset. The github issues page says that a tutorial on this is planned but not done yet, so I don’t know how to do this.

From googling this question, I see a lot of questions about parallelizing over the 4 GPUs, but since I have 40 CPUs, I think parallelizing over CPUs instead of GPUs would be faster. I am using PyTorch.

A skeleton code that loads the dataset, dataloader, and model are provided below.

from torch.utils.data import Dataset
from torch.utils.data import DataLoader

my_dataset = Dataset(...)
my_loader = DataLoader(my_dataset, ...)

state_dict = torch.load(model_save_location)
model.load_state_dict(state_dict)

device = torch.device('cpu')
model = model.to(device)

Notes:

I am aware that I can use Pool from torch.multiprocessing (as detailed here). But I would prefer to use Ray because if I want to scale to multiple nodes in the future (I only have 1 GPU node now), it would be much easier with Ray than without, I think.

sangcho · June 27, 2023, 11:49pm

cc @amogkam can you address this question?

amogkam · July 7, 2023, 9:06pm

@xzf0kgb0bqr.cev2RWU we recently added the guide to move from Torch Datasets/DataLoader to Ray Datasets! Working with PyTorch — Ray 3.0.0.dev0

Topic		Replies	Views
Can Ray Dataset facilitate training on heterogeneous clusters? Ray Data	6	1099	December 26, 2022
Keep PyTorch DataLoader when using Ray Data Ray Data	0	332	November 7, 2023
Can you copy data between two node's CPU and GPU Kubernetes	5	730	July 12, 2021
Pytorch dataloader num_workers with ray tune RLlib	2	64	May 6, 2025
Tensor parallelism with torch run inside ray Ray Core	0	115	April 29, 2024

Parallel inference using CPUs

Related topics