Would it be possible to get PyTorch Lightning modules working with the trainable API as well? I find this solution more robust with how it lets you control different aspects of checkpointing and stopping with ease.
I played around with RaySGD and it was quite similar to PTL modules albeit more granular. The most direct approach I think would be a TorchTrainer compatibility class for Lightning? Or we could modify PyTorch lightning trainers such that we can set the backend to be Ray?
The package introduces 2 new Pytorch Lightning accelerators for both DDP and Horovod training on Ray for quick and easy distributed training. It also integrates with Ray Tune for distributed hyperparameter tuning.
Please check it out, and would love to hear any feedback