- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Main question
I have a trained model. I have one GPU node which has 4 GPUs and 40 CPUs. I wish to apply the model in parallel over the 40 CPU nodes to test_data (so each CPU gets 1/40 of the test_data). How can I do this?
More details
I would like to avoid using Ray AIR if possible for two reasons:
- It is in beta testing.
- I would need to convert my PyTorch DataLoader to a Ray AIR Dataset. The github issues page says that a tutorial on this is planned but not done yet, so I donβt know how to do this.
From googling this question, I see a lot of questions about parallelizing over the 4 GPUs, but since I have 40 CPUs, I think parallelizing over CPUs instead of GPUs would be faster. I am using PyTorch.
A skeleton code that loads the dataset, dataloader, and model are provided below.
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
my_dataset = Dataset(...)
my_loader = DataLoader(my_dataset, ...)
state_dict = torch.load(model_save_location)
model.load_state_dict(state_dict)
device = torch.device('cpu')
model = model.to(device)
Notes:
- I am aware that I can use Pool from torch.multiprocessing (as detailed here). But I would prefer to use Ray because if I want to scale to multiple nodes in the future (I only have 1 GPU node now), it would be much easier with Ray than without, I think.