How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I followed the huggingface transformer trainer tutorial.
I had a valid code that was working natively using HF Transformers trainer lib and was experimenting with Ray.
I created a torch dataset object then created Ray datasets out of it:
ray_train_ds = ray.data.from_torch(train_dataset)
ray_evaluation_ds = ray.data.from_torch(test_dataset)
I then created the trainer_init_per_worker
wrapper on the HF Trainer.
scaling_config = ScalingConfig(num_workers=1, use_gpu=use_gpu)
trainer = HuggingFaceTrainer(
trainer_init_per_worker=trainer_init_per_worker,
scaling_config=scaling_config,
datasets={"train": ray_train_ds, "evaluation": ray_evaluation_ds},
)
But when I call trainer.fit()
it errors:
File "/usr/local/lib/python3.8/site-packages/ray/train/huggingface/_huggingface_utils.py", line 75, in __iter__
yield (0, {k: v for k, v in row.as_pydict().items()})
AttributeError: 'dict' object has no attribute 'as_pydict'
I did some debugging and noticed that the master branch has that single line of code changed from
yield (0, {k: v for k, v in row.as_pydict().items()}) to yield (0, {k: v for k, v in row.items()})
I think it might be a bug in 2.4.0 but I am curious how did people trained with it?