Several questions about DL training (e.g. alexnet with pytorch)

JF-D · July 10, 2021, 2:17pm

Can ray automatically transfer tensors between cpu memory and gpu memory without explicitlly do tensor.cuda()?
How do ray do cross gpu tensor communication? Can I use nccl to do high performance training?
Can I split a model into multiple parts, and ray schedule parts onto different gpus and automatically train the model?
I follow the example in Parameter Server — Ray v1.4.1 and add @ray.remote(num_gpus=1) decorator to the server and worker, I discover the throughput is quite low. How can I use ray to do high performance training of DL models (like alexnet) with pytorch?

Thanks a lot!

sangcho · July 12, 2021, 5:55pm

@amogkam do you mind answering his questions?

amogkam · July 12, 2021, 6:07pm

This is already resolved on the linked Github Issue!

Topic		Replies	Views
Model Parallelism in Ray Ray Train	9	3004	November 18, 2023
[RaySGD] Communication Backend in RaySGD Ray Train	2	541	December 7, 2021
Distributed torch model training with Ray Core APIs Ray Core	3	506	November 3, 2023
Ray.get() on Torch CUDA tensors Ray Core	7	1069	August 11, 2022
Ray train parallelize on single GPU	4	1710	July 24, 2023