How to make each worker works only on its partition?

With Ray Train, we usually split the batch and each worker work on one part of the batch. It is possible to partition the data into several partitions (number of partitions = number of workers) and then each worker work only on one partition ? How can i do this please ?

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hey @Medkne, which training framework are you using?

For PyTorch, there is a prepare_dataloader util.

For TensorFlow, by default MultiWorkerMirroredStrategy will shard the data.

If you are using the Ray Datasets integration, the data available on each worker will already be partitioned.

@matthewdeng i’m using Pytorch. Thank you for your answer !