Ds.iter_batches is much slower than from torch.dataloader

I have a test, ds.iter_batches batch szie=32, every fetch time approximately equal to 40ms,i think is much slower。
arr = np.random.randint(10, size=(1028, 224, 224, 3))
dataset: Dataset = ray.data.from_numpy(arr)
pipe : DatasetPipeline = dataset.window(blocks_per_window=1)
start_time = time.time()
for data in pipe.iter_batches(batch_size=1, batch_format=“numpy”):
print('get data time is ', time.time() - start_time)
start_time = time.time()

(For better linking of information, this is also posted on Github here and being tracked here.)