[SGD] Hydra + RaySGD (PyTorch Lightning)

malloc · April 12, 2021, 10:21am

Hi all! I have a deep learning model that I used to train using a PyTorch Lightning trainer with a hydra config. I used to run it on a single node with 4 GPUs. I now would like to run it on an on premises cluster. My question is: is there a way to use RaySGD with minimal changes to the hydra config? Thanks in advance.

sven1977 · June 3, 2021, 3:16pm

Hey @malloc , great question and sorry for the delay, which was caused by the question being “uncategorized”. It helps if you set a category (e.g. “Tune”) when you post a new question. That way, we’ll find it more easily and can assign the right person to answer it.

@rliaw , could someone from the Tune/SGD team answer this one here? Thanks

rliaw · June 15, 2021, 5:23pm

Hmm, instead I would look at using GitHub - ray-project/ray_lightning: Pytorch Lightning Distributed Accelerators using Ray which allows you to scale out your lightning trainer on the cloud!

Topic		Replies	Views
Pytorch Lightning Trainable API Compatibility Ray Tune	2	347	February 11, 2021
Distributed Training & Distributed Tuning using Ray Tune, PLT, Ray Lightning Ray Clusters	1	376	April 25, 2022
Ray.tune with pytorch: only uses 1 of 4 GPUs	1	312	May 15, 2023
Need help running tuning job on SLURM cluster with pytorch-lightning Ray Tune	7	1621	March 8, 2021
How to run multi-GPU single node training with ray and PyTorch Lightning?	0	309	February 6, 2024

[SGD] Hydra + RaySGD (PyTorch Lightning)

Related topics