Distributed Training & Distributed Tuning using Ray Tune, PLT, Ray Lightning

Hi, I am using PyTorch Lightning(1.5.10), Ray Lightning(0.2) and Ray Tune(1.10.0) to distribute training and tuning. It is unclear to me on how to do resource allocation using RayPlugin from the documentation

Here(ray_lightning/ray_ddp.py at 65f497a3c8bedb2f24bf04a5dbf0ea62b5bcb4d6 · ray-project/ray_lightning · GitHub) it says, specify GPUs in Pytorch Lightning Trainer to a value > 0

and Here(ray_lightning/ray_ddp.py at 65f497a3c8bedb2f24bf04a5dbf0ea62b5bcb4d6 · ray-project/ray_lightning · GitHub) in example it says NOT to specify resource in Trainer

Which one is right? I am able to run tuning experiments in parallel but unable to distribute training.I have 5 workers nodes with each 8 GPUs on each node

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

referring to @xiaowei / @rliaw here.