Pytorch Lightning Trainable API Compatibility

Raed · February 2, 2021, 12:00pm

Would it be possible to get PyTorch Lightning modules working with the trainable API as well? I find this solution more robust with how it lets you control different aspects of checkpointing and stopping with ease.

I played around with RaySGD and it was quite similar to PTL modules albeit more granular. The most direct approach I think would be a TorchTrainer compatibility class for Lightning? Or we could modify PyTorch lightning trainers such that we can set the backend to be Ray?

rliaw · February 4, 2021, 2:12am

Hi @Raed, we’re working on providing the Pytorch Lightning trainer backend for Ray. We’ll be sure to update you once that’s ready!

cc @amogkam

amogkam · February 11, 2021, 8:02am

Hey @Raed, we just finished implementing a Ray backend for distributed Pytorch Lightning training here- GitHub - ray-project/ray_lightning_accelerators: Pytorch Lightning Distributed Accelerators using Ray.

The package introduces 2 new Pytorch Lightning accelerators for both DDP and Horovod training on Ray for quick and easy distributed training. It also integrates with Ray Tune for distributed hyperparameter tuning.

Please check it out, and would love to hear any feedback

Topic		Replies	Views
[SGD] Hydra + RaySGD (PyTorch Lightning) Ray Tune	2	614	June 15, 2021
Ray Tune with Pytorch Lightning not recognizing GPU Ray Tune	2	3229	December 22, 2022
Need help running tuning job on SLURM cluster with pytorch-lightning Ray Tune	7	1631	March 8, 2021
TuneReportCallback is unable to read PyTorch Lightning metrics during Tuner.fit(...) Ray Tune	5	1343	January 27, 2023
Distributed Training & Distributed Tuning using Ray Tune, PLT, Ray Lightning Ray Clusters	1	377	April 25, 2022

Pytorch Lightning Trainable API Compatibility

Related topics