Model Parallelism in Ray

Hi folks, it seems like Ray Train focuses on distributed training with data parallelism. I am wondering if there is a use case with model parallelism. In our specific use-case, we are training large-scale embeddings, and these typically require model parallelism due to a large embedding matrix that cannot fit in the memory of one machine.

That’s a great use case. You mentioned offline that you were looking at Pytorch Biggraph in particular?

Thanks, Richard, for the quick response. Yes, in particular, we are looking to train node embeddings on large graphs, and use Pytorch-BigGraph as a framework for training over a Ray cluster.

Officially, we don’t have any pre-existing examples. However, it should work fine (given that Ray Train just constructs the process group for you).

We would be happy to help guide you through the implementation, if you have any particular questions.

(and also subsequently highlight your use case as a successful example down the road!)

Of course! Would be happy to contribute to the ray ecosystem in any which way.

The fact that Ray Train just constructs the process group makes sense. However, the examples provided in the documentation give an impression that the API is limited and works mostly for the data parallel case. After decoding PyTorch-BigGraph, I think I can train with model parallelism on Ray if only I can replace init_process_group from torch.distributed by an equivalent ray function. Is there a similar API in Ray? Thanks!

Can you post an example of what you want to do?