[SGD] [Tune] How about the performance of RaySGD compared with pytorch DDP?

rliaw · April 20, 2021, 6:38pm

Yeah, I think the main problem is like the optimizer that you create outside of the operator will have buffers that are based on CPU (rather than GPU).

Sheikh_Nasrullah · April 22, 2021, 11:55pm

A comment !
Shouldn’t we expect Ray SGD to have lower performance than torch.distributed as Ray incurs overhead (scheduling, updating system state, etc) ?

Is there any tool to measure the various metrics such as time spent on data transfer ?

Topic		Replies	Views
Performance issue of back-propagation in using RaySGD Ray Tune	3	361	July 30, 2021
[SGD] [Tune] Issue with ray.util.sgd.data.Dataset API Ray Tune	6	490	April 23, 2021
Ray + torch.distributed/DDP resource management	1	1148	September 21, 2022
Torch DDP backend performance issues Ray Tune	0	566	April 5, 2022
What is the right way of using Ray tune with Pytorch DDP Ray Tune	1	919	February 23, 2024

[SGD] [Tune] How about the performance of RaySGD compared with pytorch DDP?

Related topics