Instantiate the Hugging Face Dataset directly in the train_loop_per_worker directly enables DDP?

ray11 · June 10, 2024, 5:29pm

I am Instantiate the Hugging Face Dataset directly in the train_loop_per_worker but I am not clear if it automatically distributed data among workers (DDP/DP) or each worker work on the full copy of the dataset.

https://docs.ray.io/en/latest/train/user-guides/data-loading-preprocessing.html

Could anyone explain what happens in the background here? My end goal is to enable DDP on single-node/multi-gpu…

Topic		Replies	Views
[Tune] lightning without model/dataset parallelism	0	110	March 31, 2024
Aggregation of distributed metrics Ray Train	1	608	March 4, 2022
Custom data sharing in DataParallelTrainer	1	129	April 16, 2024
Ray Tune for single-node distributed training in PyTorch Ray Tune	3	983	August 24, 2021
HuggingFacePredictor Multi-GPU Ray Serve	3	720	July 12, 2023

Instantiate the Hugging Face Dataset directly in the train_loop_per_worker directly enables DDP?

Related topics