How to use BERT tokenizer in ray cluster?

Hello Team,


I have 3 node cluster setup, I am doing text pre-processing where I need Bert tokenizer as well. I want to distribute it as well.
I am looking for suggestion on how to distribute it, only this piece is running on local.

  • I am thinking to use Ray[serve]
  • Distribute ray service only for tokenization. I want to avoid it for n/w overhead
  • Using cloudpickle

Thanks in advance.

Consider it resolved.

Awesome; what was your resolution?