I am trying to train a model using HF Transformer integration with Ray.
I created Torch datasets then created Ray datasets out of them as follows:
ray_train_ds = ray.data.from_torch(train_dataset)
ray_evaluation_ds = ray.data.from_torch(test_dataset)
Then I created the wrapper function trainer_init_per_worker
and passed it.
scaling_config = ScalingConfig(num_workers=1, use_gpu=use_gpu)
trainer = HuggingFaceTrainer(
trainer_init_per_worker=trainer_init_per_worker,
scaling_config=scaling_config,
datasets={"train": ray_train_ds, "evaluation": ray_evaluation_ds},
)
result = trainer.fit()
I then face the following error.
File "/usr/local/lib/python3.8/site-packages/ray/train/huggingface/_huggingface_utils.py", line 75, in __iter__
yield (0, {k: v for k, v in row.as_pydict().items()})
AttributeError: 'dict' object has no attribute 'as_pydict'
I did debug and found that in the current master branch, HuggingFaceTrainer
is changed to subclass TransformersTrainer
and that the line producing the error was also changed from yield (0, {k: v for k, v in row.as_pydict().items()})
to yield (0, {k: v for k, v in row.items()})
, maybe it was a bug to begin with? but how were people training already? Anyone can help understand what’s happening?
For reference that’s the line of code in v2.4.0 that is producing errors and that’s how it is changed in master branch