How to use Ray to train HuggingFace tokenizer in a distributed way?

tonygracious · July 17, 2024, 12:34pm

How to use ray to train huggingface tokenzier using the API tokenizer.train_from_iterator(…)?

You can create an iterator using the “ray.data.Dataset.iter_batches” API? However, it will not do distributed training as we are sequentially iterating over a batch of data.

Topic		Replies	Views
'RayDatasetHFIterable' object has no attribute 'iter_arrow'	3	720	July 7, 2023
Error in HuggingFaceTrainer (Transoformer) v2.4.0 Ray Data	6	814	June 9, 2023
About the Ray Train category Ray Train	0	787	August 29, 2021
Ray train not work in pretrain model Ray Train	1	738	March 28, 2023
How to use BERT tokenizer in ray cluster? Ray Core	2	369	December 22, 2020

How to use Ray to train HuggingFace tokenizer in a distributed way?

Related topics