Distributed training and distributed distributed training

habib-source · June 30, 2025, 8:15pm

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.

2. Environment:

Ray version: 2.47.1
Python version: 3.12.8
OS: Arch
Cloud/Infrastructure: Local

In this tutorial:
net = nn.DataParallel(net)
but in this one it haven’t been used :
does that mean in the first one it will use every gpu in the cluter and the secont one will use a singel gpu(in every node or overrall)

and how does that differ from ray trayn

Topic		Replies	Views
torch.nn.DataParallel with tune.run() Ray Tune	1	756	June 28, 2022
Ray multiprocessing together with distributed learning Ray Train	1	559	March 2, 2022
Ray.tune with pytorch: only uses 1 of 4 GPUs	1	312	May 15, 2023
Ray Tune for single-node distributed training in PyTorch Ray Tune	3	1001	August 24, 2021
Ray actor multiple gpu available but only one used Ray Core	3	108	October 4, 2024

Distributed training and distributed distributed training

Related topics