How to use BERT in ray cluster?

Peter_Pirog · April 17, 2021, 6:11pm

Hi,
I would like to train BERT language model from scratch, I would like to use several computers with GPU cards to train the model (one GPU card with high GPU memory is very expensive, few cards with 8 GB are available). I consider splitting batches into machines and making asynchronous updates.
Does anybody have experience in ray clusters in this area or maybe can suggest some tutorial about BERT distributed training?

rliaw · April 20, 2021, 5:13pm

Hmm, maybe you could consider using RaySGD (here is an example).

Alternatively, we also have a Pytorch Lightning integration: GitHub - ray-project/ray_lightning: Pytorch Lightning Distributed Accelerators using Ray that you can use to scale training.

Topic		Replies	Views
How to use BERT tokenizer in ray cluster? Ray Core	2	372	December 22, 2020
Model Parallelism in Ray Ray Train	9	2960	November 18, 2023
Accessing Large Static Datasets with Ray Clusters	3	548	May 27, 2023
Resources allocation during serve deployment Ray Serve	5	664	December 3, 2022
Anybody managed to use Ray Train in a SageMaker Training cluster? Ray Train	0	1228	December 30, 2021

How to use BERT in ray cluster?

Related topics