How to make each worker works only on its partition?

Medkne · July 29, 2022, 12:40pm

With Ray Train, we usually split the batch and each worker work on one part of the batch. It is possible to partition the data into several partitions (number of partitions = number of workers) and then each worker work only on one partition ? How can i do this please ?

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

matthewdeng · August 1, 2022, 4:54am

Hey @Medkne, which training framework are you using?

For PyTorch, there is a prepare_dataloader util.

For TensorFlow, by default MultiWorkerMirroredStrategy will shard the data.

If you are using the Ray Datasets integration, the data available on each worker will already be partitioned.

Medkne · August 1, 2022, 1:36pm

@matthewdeng i’m using Pytorch. Thank you for your answer !

Topic		Replies	Views
Distributed training with different number of batches Ray Data	0	20	May 11, 2025
How to divide data freely to worker? Ray Train	8	769	April 11, 2024
Synchronizing workers during ray train Ray Train	8	798	February 25, 2025
Running batches of data by multiple work process Ray Core	5	503	April 6, 2022
[RAY SGD] Train pytorch model on machine with 2 GPUs Ray Tune	2	431	February 19, 2021

How to make each worker works only on its partition?

Related topics