Hi all! I have a deep learning model that I used to train using a PyTorch Lightning trainer with a hydra config. I used to run it on a single node with 4 GPUs. I now would like to run it on an on premises cluster. My question is: is there a way to use RaySGD with minimal changes to the hydra config? Thanks in advance.
Hey @malloc , great question and sorry for the delay, which was caused by the question being “uncategorized”. It helps if you set a category (e.g. “Tune”) when you post a new question. That way, we’ll find it more easily and can assign the right person to answer it.
@rliaw , could someone from the Tune/SGD team answer this one here? Thanks
Hmm, instead I would look at using GitHub - ray-project/ray_lightning: Pytorch Lightning Distributed Accelerators using Ray which allows you to scale out your lightning trainer on the cloud!