Keep PyTorch DataLoader when using Ray Data

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I wonder whether there is a solution to use Ray Data with my current PyTorch DataLoader.

Here is my current code.

for inputs in next(dataloader):
    outputs = model(inputs) # inference

Now I hope to decouple inference and post process to fully utilize CPU/GPU resources. I guess my code can be rewritten using Ray Data as follows.

ds =
ds =
ds = ds.map_batches(model,, num_gpus=1, batch_size=1024)
ds =
# then pull data from ds

The problem is to migrate data loading logic from PyTorch DataLoader to Ray Data. Ray Data Doc says:

Any logic for reading data from cloud storage and disk can be replaced by one of the Ray Data read_* APIs, and any transformation logic can be applied as a map call on the Dataset.

However, it may take some effort when data loading logic is complicated. Besides, I don’t want to risk data loading performance degradation.

It seems that is always constructed from some data source(for example, Is there any way to use Ray Data with my current PyTorch DataLoader, or I may need to use more primitive Ray APIs(for example, manipulate tasks and actors directly) to decouple inference and post process.