'RayDatasetHFIterable' object has no attribute 'iter_arrow'

Hi,
I’m using Ray for distributed training of Hugging face transformers models. Recently I’m getting an error while training my model using trainer.fit.
ata = self._dataset_fetcher.fetch(index) # may raise StopIteration
File “/databricks/python/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py”, line 34, in fetch

2023-06-21 06:33:41,317	ERROR trial_runner.py:1062 -- Trial HuggingFaceTrainer_77970_00000: Error processing event.
ray.exceptions.RayTaskError(AttributeError): ray::_Inner.train() (pid=9689, ip=10.1.235.136, repr=HuggingFaceTrainer)
  File "/databricks/python/lib/python3.9/site-packages/ray/tune/trainable/trainable.py", line 368, in train
    raise skipped from exception_cause(skipped)
  File "/databricks/python/lib/python3.9/site-packages/ray/train/_internal/utils.py", line 54, in check_for_failure
    ray.get(object_ref)
ray.exceptions.RayTaskError(AttributeError): ray::RayTrainWorker.RayTrainWorker_execute() (pid=9797, ip=10.1.235.134, repr=<ray.train._internal.worker_group.RayTrainWorker object at 0x7f35696f0d60>)
  File "/databricks/python/lib/python3.9/site-packages/ray/train/_internal/worker_group.py", line 31, in __execute
    raise skipped from exception_cause(skipped)
  File "/databricks/python/lib/python3.9/site-packages/ray/train/_internal/utils.py", line 129, in discard_return_wrapper
    train_func(*args, **kwargs)
  File "/databricks/python/lib/python3.9/site-packages/ray/train/huggingface/huggingface_trainer.py", line 417, in _huggingface_train_loop_per_worker
    trainer.train()
  File "/databricks/python/lib/python3.9/site-packages/transformers/trainer.py", line 1527, in train
    return inner_training_loop(
  File "/databricks/python/lib/python3.9/site-packages/transformers/trainer.py", line 1749, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):
  File "/databricks/python/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 628, in _next_
    data = self._next_data()
  File "/databricks/python/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 671, in _next_data
    d
    data.append(next(self.dataset_iter))
  File "/databricks/python/lib/python3.9/site-packages/transformers/trainer_pt_utils.py", line 804, in _iter_
    for element in self.dataset:
  File "/databricks/python/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1344, in _iter_
    if self._formatting and (ex_iterable.iter_arrow or self._formatting.format_type == "arrow"):
AttributeError: 'RayDatasetHFIterable' object has no attribute 'iter_arrow'

I’m loading the dataset using “ray.data.from_huggingface” i was able to run the code without any issue but since yesterday this error popped up. i tried loading from paraquet files also. can anyone help me with this issue.
Here is my environment:
Databricks:12.2LTS
ray 2.3.1

(reopening this since it is something multiple users have run into)

Believe this is an issue with upgrading datasets package

Hi, I have experienced the same issue with Ray 2.5.1 using the image rayproject/ray:2.5.1-py39-cu118. The workaround is to use datasets==2.10.1 in the --runtime-env-json job submit option. For instance:

ray job submit --address http://172.29.214.14:8265 --working-dir ./falcon/LLM-distributed-finetune-main/src/ --runtime-env-json='{"pip": ["torch==2.0.1","transformers==4.30.2","deepspeed==0.9.5", "accelerate==0.20.3", "peft", "bitsandbytes==0.39.1","datasets==2.10.1", "einops==0.6.1"]}' -- python finetune.py