How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
The Problem:
I have a large ray dataset and want to split it like this:
train_ds, test_ds = ray_ds.train_test_split(test_size=0.05, shuffle=True)
for batch in train.iter_torch_batches(batch_size=1, device="cuda:0", ):
# training
Unfortunately it only works with shuffle=False
otherwise I get the following error:
Traceback (most recent call last):
File "/home/USER/PycharmProjects/Projectname_Preprocessing/CanonicalDataset.py", line 395, in <module>
for batch in train.iter_torch_batches(batch_size=1, device="cuda:0", ):
File "/home/USER/anaconda3/envs/Projectname_Preprocessing/lib/python3.9/site-packages/ray/data/dataset.py", line 2523, in iter_torch_batches
for batch in self.iter_batches(
File "/home/USER/anaconda3/envs/Projectname_Preprocessing/lib/python3.9/site-packages/ray/data/dataset.py", line 2450, in iter_batches
yield from batch_blocks(
File "/home/USER/anaconda3/envs/Projectname_Preprocessing/lib/python3.9/site-packages/ray/data/_internal/block_batching.py", line 129, in batch_blocks
yield from get_batches(block_window[0])
File "/home/USER/anaconda3/envs/Projectname_Preprocessing/lib/python3.9/site-packages/ray/data/_internal/block_batching.py", line 99, in get_batches
result = _format_batch(batch, batch_format)
File "/home/USER/anaconda3/envs/Projectname_Preprocessing/lib/python3.9/site-packages/ray/data/_internal/block_batching.py", line 147, in _format_batch
batch = BlockAccessor.for_block(batch).to_numpy()
File "/home/USER/anaconda3/envs/Projectname_Preprocessing/lib/python3.9/site-packages/ray/data/_internal/arrow_block.py", line 217, in to_numpy
arrays.append(array.to_numpy(zero_copy_only=False))
File "/home/USER/anaconda3/envs/Projectname_Preprocessing/lib/python3.9/site-packages/ray/air/util/tensor_extensions/arrow.py", line 285, in to_numpy
return self._to_numpy(zero_copy_only=zero_copy_only)
File "/home/USER/anaconda3/envs/Projectname_Preprocessing/lib/python3.9/site-packages/ray/air/util/tensor_extensions/arrow.py", line 269, in _to_numpy
return np.ndarray(shape, dtype=ext_dtype, buffer=data_buffer, offset=offset)
TypeError: buffer is too small for requested array
I’m grateful for any ideas how to fix it.